Core Concepts
This section explains how GENEALOGIX works: the flexible vocabularies, entities, and evidence model that make GLX different from traditional genealogy formats.
Archive-Owned Vocabularies
Why Archive-Level Control?
Unlike traditional genealogy formats with fixed type systems, GENEALOGIX uses archive-owned controlled vocabularies. Each archive defines its own valid types in vocabulary files, combining standardization with flexibility.
This is GENEALOGIX's most powerful feature: each archive controls its own type system.
Freedom and Flexibility
Unlike formats with centrally-defined types, GLX lets each archive define exactly what it needs. You can add custom entries to any vocabulary — event types, relationship types, place types, and more:
- Traditional Genealogy: Use standard types (marriage, parent-child, adoption)
- Colonial History: Add custom types (indenture, manumission, land_grant)
- Religious Studies: Define custom events (ordination, investiture, pilgrimage)
- Biography: Create domain-specific relationships (mentor-mentee, patron-artist)
- Local History: Track community roles (town_selectman, guild_master)
- Maritime Research: Add ship_departure, shipwreck, port_arrival events
- Any research domain with people, events, and relationships
Autonomy Without Chaos
You get both standardization AND flexibility:
- Standard starter vocabularies: New archives begin with common genealogy types
- Extend as needed: Add custom types specific to your research
- Archive-owned: No central committee, no approval process
- Git-versioned: Vocabulary changes tracked with your data
- Validated: The CLI ensures all used types are defined
- Collaborative: Teams discuss and agree on types within the archive
Standard Vocabulary Files
When you create an archive with glx init or glx import, standard vocabulary files are automatically created (by default in a vocabularies/ directory, though they can live anywhere in the archive):
| File | Contents |
|---|---|
event-types.glx | Birth, death, baptism, immigration, etc. |
relationship-types.glx | Marriage, parent-child, adoption, etc. |
place-types.glx | Country, city, parish, etc. |
source-types.glx | Vital records, census, church registers, etc. |
repository-types.glx | Archive, library, church, etc. |
media-types.glx | Photo, document, audio, etc. |
participant-roles.glx | Subject, witness, godparent, etc. |
confidence-levels.glx | High, medium, low, disputed |
gender-types.glx | Male, female, unknown, other |
person-properties.glx | Person properties (name, occupation, etc.) |
event-properties.glx | Event properties |
relationship-properties.glx | Relationship properties |
place-properties.glx | Place properties |
media-properties.glx | Media properties |
repository-properties.glx | Repository properties |
source-properties.glx | Source properties |
citation-properties.glx | Citation properties |
Property Vocabularies
Property vocabularies are a special category that define the custom properties available for each entity type. These are critical for the assertion model (see sections below).
Property vocabularies define:
- What properties can exist on entities (name, occupation, residence, etc.)
- Property value types (string, date, integer, boolean, or references to other entities)
- Whether properties are temporal (can change over time)
- Whether properties support multiple values
- Structured fields for complex properties (name → given, surname, prefix, suffix)
Example from person-properties.glx:
person_properties:
name:
label: "Name"
description: "Person's name as recorded"
value_type: string
temporal: true
fields:
given:
label: "Given Name"
description: "Given/first name(s)"
surname:
label: "Surname"
description: "Family name"
occupation:
label: "Occupation"
description: "Profession or trade"
value_type: string
temporal: true
residence:
label: "Residence"
description: "Place of residence"
reference_type: places
temporal: trueHow to Add Custom Types
Adding custom types is straightforward:
1. Edit the appropriate vocabulary files
# participant-roles.glx
participant_roles:
# ... standard roles ...
# Your custom roles
apprentice:
label: "Apprentice"
description: "Person learning a trade"
applies_to:
- event
master:
label: "Master"
description: "Person teaching a trade"
applies_to:
- event# event-types.glx
event_types:
# ... standard types ...
# Your custom type
apprenticeship:
label: "Apprenticeship"
description: "Beginning of apprenticeship training"
category: "occupation"2. Use the new types in your data
# events/event-john-apprentice.glx
events:
event-john-apprentice:
type: apprenticeship
date: "1865-03-15"
place: place-leeds
participants:
- person: person-john-smith
role: apprentice
- person: person-thomas-brown
role: master3. Validate your archive
glx validateThe validator will confirm that all types are defined and all references are valid.
Validation Behavior
The glx validate command enforces vocabulary consistency with different severity levels:
Errors (must be fixed):
- Entity references that don't exist (person, place, source, etc.)
- Vocabulary type references that aren't defined (event_types, relationship_types, etc.)
- Property references when properties are defined as
reference_typebut the referenced entity doesn't exist
Warnings (flexible):
- Unknown properties not defined in property vocabularies
- Unknown assertion properties not defined in property vocabularies
- Warnings allow rapid data entry and emerging properties without breaking validation
This policy balances strictness (broken references are errors) with flexibility (unknown properties generate warnings, not errors).
This flexibility makes GLX suitable for traditional family history, local and community history, biographical research, prosopography (collective biography), historical demography, and any research involving people, events, and relationships.
Entity Relationships
GENEALOGIX uses 9 core entity types that form an interconnected web representing genealogical research:
Core Entities
Person ←→ Relationship ←→ Person
Person ←→ Event ←→ Place
Source ←→ Citation → Assertion → Person/Event/Place
Repository → Source
Media → (any entity)The 9 Entity Types:
- Person - Individuals in the family archive
- Relationship - Connections between people (marriage, parent-child, etc.)
- Event - Occurrences in time and place (birth, death, marriage, immigration)
- Place - Geographic locations with hierarchical structure
- Source - Information sources (books, records, websites, databases)
- Citation - Specific references within sources
- Repository - Institutions holding sources (archives, libraries, churches)
- Media - Digital objects (photos, documents, audio, video)
- Assertion - Evidence-based conclusions about facts
Validation Dependencies
These relationships create validation requirements that ensure archive integrity. The glx validate command enforces referential integrity (citations → sources, assertions → citations/sources/media, events → places, participants → persons, relationships → persons). See Validation Behavior for complete validation policy.
Data Types
GENEALOGIX uses fundamental data types throughout the specification for entity properties and values.
Primitive Types
String
A sequence of Unicode characters. Strings are the default type when no specific type is specified in property definitions.
Example:
name:
value: "John Smith"
fields:
given: "John"
surname: "Smith"
occupation: "blacksmith"Integer
A whole number (positive, negative, or zero). Used for numeric values like population counts.
Example:
population: 5000Boolean
A true/false value.
Example:
verified: true
primary_source: falseDate
A calendar date or fuzzy date specification. GENEALOGIX uses YYYY-MM-DD format for precise dates combined with FamilySearch-inspired keywords for fuzzy dates. Dates may include an optional calendar prefix for non-Gregorian calendar systems (e.g., JULIAN 1731-03-15).
Date Format Standard
GENEALOGIX uses a hybrid date format combining:
- YYYY-MM-DD dates for precise calendar dates
- FamilySearch-inspired keywords for approximate, ranged, and calculated dates
This format supports both precise dates and fuzzy/approximate dates commonly encountered in genealogical research.
Format Specification
Simple Dates:
YYYY- Year only (4 digits required, e.g.,1850,2020,0047)YYYY-MM- Year and month (e.g.,1850-03,2020-12)YYYY-MM-DD- Full date (e.g.,1850-03-15,2020-12-31)
Keyword Modifiers (FamilySearch-inspired):
Approximate Dates:
ABT YYYY- About/approximately (e.g.,ABT 1850)BEF YYYY- Before (e.g.,BEF 1920)AFT YYYY- After (e.g.,AFT 1880)CAL YYYY- Calculated (e.g.,CAL 1850)
Date Ranges:
BET YYYY AND YYYY- Between two dates (e.g.,BET 1880 AND 1890)FROM YYYY TO YYYY- Range with start and end (e.g.,FROM 1900 TO 1950)FROM YYYY- Open-ended range from a start date (e.g.,FROM 1900)
Interpreted Dates:
INT YYYY-MM-DD (original text)- Interpreted from original source (e.g.,INT 1850-03-15 (March 15th, 1850))
Important Notes
Year Format: Years must be exactly 4 digits. Pad with zeros for years before 1000 CE (e.g.,
0047for year 47,0800for year 800).Date Format: GENEALOGIX uses YYYY-MM-DD format (e.g.,
1850-03-15for March 15, 1850). This is the international standard for date representation, chosen for its clarity and sortability.Keywords vs Full Format: GENEALOGIX uses keywords inspired by the FamilySearch Normalized Date Format, but the underlying date representation uses YYYY-MM-DD, not the full FamilySearch format.
Keyword Combinations: Keywords can be combined with any simple date format (e.g.,
ABT 1850,ABT 1850-03,ABT 1850-03-15).
Date Examples
# Precise dates on an event
date: "1850-03-15" # Full date
date: "1850-03" # Year and month
date: "1850" # Year only
date: "0047" # Year 47 AD (zero-padded)
# Approximate dates
date: "ABT 1850" # About 1850
date: "BEF 1920" # Before 1920
date: "AFT 1880-06" # After June 1880
# Date ranges
residence:
- value: "place-leeds"
date: "FROM 1900 TO 1950" # Lived in Leeds 1900-1950
- value: "place-london"
date: "FROM 1950" # Lived in London from 1950 onward
# Fuzzy dates
date: "BET 1880 AND 1890" # Between 1880 and 1890
# Calculated dates
date: "CAL 1850" # Calculated from other evidence
# Interpreted dates
date: "INT 1850-03-15 (15th March 1850)" # Original text preservedNon-Gregorian Calendar Dates
Dates from non-Gregorian calendar systems use a calendar prefix before the date body. Gregorian is the default — no prefix is needed for the vast majority of dates.
Supported calendars:
| Prefix | Calendar | GEDCOM Equivalent |
|---|---|---|
| (none) | Gregorian (default) | @#DGREGORIAN@ |
JULIAN | Julian calendar | @#DJULIAN@ |
HEBREW | Hebrew calendar | @#DHEBREW@ |
FRENCH_R | French Republican calendar | @#DFRENCH R@ |
Format: CALENDAR date-body where date-body is any valid date format (simple, keyword, or range).
# Gregorian dates (default — no prefix)
date: "1731-03-15"
# Julian calendar dates
date: "JULIAN 1731-03-15" # Julian March 15, 1731 (≠ Gregorian March 15)
date: "JULIAN ABT 1731" # About 1731, Julian calendar
date: "JULIAN 1731-03" # Julian March 1731
# Hebrew calendar dates (raw month names preserved)
date: "HEBREW 15 TSH 5765" # 15 Tishrei 5765
# French Republican calendar dates (raw month names preserved)
date: "FRENCH_R 1 VEND 0012" # 1 Vendemiaire Year 12Design notes:
- No calendar conversion is performed. Dates are preserved exactly as the source recorded them, consistent with the evidence-first methodology. A Julian date is stored as Julian, not converted to Gregorian.
- Gregorian is the default. Dates without a prefix are Gregorian. The
GREGORIANprefix is never written. - Hebrew and French Republican dates preserve raw month names (e.g.,
TSH,VEND) because GENEALOGIX does not parse non-Gregorian month names into structured dates. - Unknown calendars are preserved. If a GEDCOM file uses a non-standard calendar escape, the calendar name is preserved as a prefix (with spaces normalized to underscores).
- Calendar prefixes align with GEDCOM 7.0 calendar names. GEDCOM 5.5.1 escape sequences (e.g.,
@#DJULIAN@) are converted to the equivalent prefix on import.
Date Validation
GENEALOGIX validates date formats at two levels:
- Structure: Dates must follow the format specifications above
- Keywords: Only the defined keywords (FROM, TO, ABT, BEF, AFT, BET, AND, CAL, INT) are recognized
- Calendar prefixes: Known calendar prefixes (JULIAN, HEBREW, FRENCH_R) are stripped before validating the date body. Unknown prefixes are accepted without warning to allow extensibility
Invalid date formats will generate validation warnings (not errors), allowing archives with imperfect dates to still load while alerting researchers to potential data quality issues.
Reference Types
Reference types indicate that a property value is a string identifier that must exist as an entity in the archive. References are validated at runtime against the actual entities in the archive.
Supported Reference Types
- persons - Reference to a person entity
- places - Reference to a place entity
- events - Reference to an event entity
- relationships - Reference to a relationship entity
- sources - Reference to a source entity
- citations - Reference to a citation entity
- repositories - Reference to a repository entity
- media - Reference to a media entity
Examples:
# Simple reference to a place
properties:
residence: "place-leeds"# Temporal reference to a place (changed over time)
properties:
residence:
- value: "place-london"
date: "FROM 1900 TO 1920"Properties: Recording Conclusions
What Are Properties?
Properties represent the researcher's current conclusions about an entity. They are the "accepted values" you record as you work:
# persons/person-john-smith.glx
persons:
person-john-smith:
properties:
name:
value: "John Smith"
fields:
given: "John"
surname: "Smith"
gender: "male"
occupation: "blacksmith"
residence: "place-leeds" # Single-value shorthand; see Temporal Properties for list formatDefined by Property Vocabularies
All properties are defined in property vocabularies (see Archive-Owned Vocabularies above). The person-properties.glx vocabulary defines:
- What properties exist (
name,gender,occupation, etc.) - Their data types (string, date, place reference)
- Whether they can change over time (temporal)
- Whether they have structured fields
Properties Can Exist Without Assertions
Properties can be set without assertions, supporting rapid data entry. You can add assertions later as you research sources. See How Properties and Assertions Work Together for examples.
Temporal Properties
Properties marked as temporal: true in vocabularies can hold multiple values — either dated (for values that change over time) or undated (for multiple values without known dates). They support three formats:
Single Value (for properties that don't change or represent a point in time):
properties:
gender: "male"
occupation: "blacksmith"Dated List (for values that change over time):
properties:
occupation:
- value: "blacksmith"
date: "1880"
- value: "farmer"
date: "FROM 1885 TO 1920"
residence:
- value: "place-leeds"
date: "FROM 1850 TO 1900"
- value: "place-london"
date: "FROM 1900 TO 1920"Undated List (for multiple values without date information):
# An obituary lists occupations but no dates
properties:
occupation:
- value: "teacher"
- value: "school principal"
- value: "county superintendent"This is common when a source (like an obituary, biographical sketch, or family letter) mentions multiple values but doesn't specify when each applied. The list format captures all known values without forcing artificial dates.
Each list entry includes:
value- The property value, conforming to the property'svalue_typeorreference_typedate- Optional date string specifying when the value applied
Dated and undated entries can be mixed in the same list — use dates where you have them, omit where you don't.
Structured Properties
Properties can have structured fields for complex data. There are three usage patterns:
1. Value only (simple properties):
properties:
occupation: "blacksmith"
religion: "Church of England"2. Value + Fields (preserve original while providing structure):
properties:
name:
value: "Dr. John Smith Jr."
fields:
type: "birth"
prefix: "Dr."
given: "John"
surname: "Smith"
suffix: "Jr."The value field preserves the original recorded form, while fields provide structured access to components. This is the recommended approach for most structured properties.
3. Fields only (when there's no natural single-value representation):
properties:
crop:
fields:
top: 450
left: 100
width: 800
height: 200Notes Field
All entities support an optional notes field for free-form text:
persons:
person-john-smith:
properties:
name:
value: "John Smith"
fields:
given: "John"
surname: "Smith"
notes: |
Research notes about this person.
Questions for future investigation.Use notes to:
- Document research decisions and uncertainties
- Record questions for future investigation
- Provide context not captured elsewhere
How Properties Complement Assertions
Properties and assertions work together:
- Properties = "What we currently believe"
- Assertions = "Why we believe it, with evidence"
Properties can be recorded quickly during initial data entry. Assertions document the research trail explaining why those properties have their values. Multiple assertions can support a single property, or present conflicting evidence about what the property value should be.
Assertion-Aware Data Model
See Also: For complete assertion entity specification, see Assertion Entity
The Problem with Traditional Models
Traditional genealogy software stores conclusions directly:
Person: John Smith
Birth: January 15, 1850
Place: Leeds, YorkshireThis approach loses the critical distinction between evidence (what sources say) and conclusions (what we believe). If conflicting evidence emerges, there's no clear way to represent uncertainty or evaluate source quality.
GENEALOGIX Solution: Assertions
GENEALOGIX separates evidence from conclusions using assertions. An assertion is an evidence-backed claim about a specific fact:
# assertions/assertion-john-birth.glx
assertions:
assertion-john-birth:
subject:
event: event-birth-john
property: date
value: "1850-01-15"
citations:
- citation-birth-certificate
- citation-baptism-record
confidence: highHow Assertions Work
Core fields:
subject: Typed reference to the entity this assertion is about (person, event, relationship, place)property: The property being asserted (references property vocabulary)value: The concluded value of the propertycitations,sources, ormedia: Evidence supporting this assertion (at least one required)confidence: How certain we are based on evidence qualitystatus: Research state of the assertion (e.g.,proven,speculative,disproven) — independent of confidence
The property field references property vocabularies:
# The property "occupation" must be defined in person-properties.glx
assertion-john-occupation:
subject:
person: person-john-smith
property: occupation # Validated against person_properties vocabulary
value: "blacksmith"
citations: [citation-trade-directory]The validator checks that occupation is defined in person-properties.glx. This ensures consistency between properties and assertions.
Evidence-Based Claims
Assertions must cite their evidence:
# citations/citation-birth-certificate.glx
citations:
citation-birth-certificate:
source: source-gro-register
properties:
locator: "Certificate 1850-LEEDS-00145"
text_from_source: "John Smith, born January 15, 1850"Every assertion requires at least one citation, source, or media reference, creating an audit trail from conclusion back to original evidence.
Conflicting Evidence
Multiple assertions can exist for the same fact, representing conflicting evidence:
assertions:
# Assertion based on birth certificate
assertion-mary-birth-cert:
subject:
event: event-birth-mary
property: date
value: "1852-03-10"
citations: [citation-birth-cert]
confidence: high
notes: "Primary direct evidence"
# Assertion based on family Bible
assertion-mary-birth-bible:
subject:
event: event-birth-mary
property: date
value: "1852-03-12"
citations: [citation-family-bible]
confidence: medium
notes: |
Family Bible entry conflicts with certificate.
Bible likely written from memory later.
Certificate takes precedence as primary source.Confidence Levels
Assertions include confidence levels based on evidence quality:
confidence_levels:
high:
label: "High Confidence"
description: "Multiple high-quality sources agree, minimal uncertainty"
medium:
label: "Medium Confidence"
description: "Some evidence supports conclusion, but conflicts or gaps exist"
low:
label: "Low Confidence"
description: "Limited evidence, significant uncertainty"
disputed:
label: "Disputed"
description: "Multiple sources conflict, resolution unclear"Confidence levels are defined in confidence-levels.glx and can be customized per archive.
Benefits of This Approach
- Multiple Evidence: One assertion can reference multiple citations, showing corroboration
- Conflicting Evidence: Multiple assertions can exist for the same property, documenting disagreements
- Research Transparency: Clear audit trail from source to conclusion
- Confidence Tracking: Assertions express certainty based on evidence quality
- Flexible Data Entry: Properties can be recorded quickly, assertions added during research
- Source Quality: Different evidence can be weighted differently via confidence levels
How Properties and Assertions Work Together
# 1. Quick data entry - just properties
persons:
person-john:
properties:
occupation: "blacksmith"
residence: "place-leeds"
# 2. Later: Add assertions documenting the evidence
assertions:
assertion-john-occupation:
subject:
person: person-john
property: occupation
value: "blacksmith"
date: "FROM 1870 TO 1890"
citations:
- citation-1851-census
- citation-trade-directory
confidence: highProperties record what we believe. Assertions document why we believe it, with evidence. For temporal properties like occupation, the assertion's date field specifies when the value applies.
Existential Assertions
An assertion with only a subject and evidence — no property, value, or participant — is an existential assertion. It simply says: "this entity is evidenced by these sources."
assertions:
assertion-john-alice-parentage:
subject:
relationship: rel-john-alice-parent-child
citations:
- citation-1880-census
confidence: high
notes: "Census shows John Smith as head of household with Alice listed as daughter"Adding a date makes it temporal — "this entity existed at this time":
assertions:
assertion-john-alice-parentage:
subject:
relationship: rel-john-alice-parent-child
date: "1880"
citations:
- citation-1880-census
confidence: highWhen to use existential assertions:
- Relationships — evidence that a parent-child or marriage relationship existed, before asserting specific property values
- Events — confirming an event occurred without yet asserting its date or place
- Places — documenting that a place existed at a given time
Existential assertions are useful during early research phases when you have evidence that an entity exists but haven't yet established specific property values. They let you document the evidence chain immediately, then add property assertions later as research progresses.
Evidence Chain
GENEALOGIX organizes genealogical evidence in a hierarchical chain from physical sources to conclusions:
Complete Evidence Chain
Repository → Source → Citation → Assertion → Property
↓ ↓ ↓ ↓ ↓
Physical Original Specific Evidence- Concluded
Location Material Reference Based Value on
Claim EntityEach level provides context and traceability for research. Here's a complete example showing all links in the chain:
# 1. Repository - Physical institution
repositories:
repository-gro:
name: General Register Office
address: "London, England"
# 2. Source - Original document
sources:
source-birth-register:
title: England Birth Register 1850
repository: repository-gro
# 3. Citation - Specific reference
citations:
citation-john-birth:
source: source-birth-register
properties:
locator: "Volume 23, Page 145, Entry 23"
text_from_source: "John Smith, born 15 January 1850, Leeds"
# 4. Assertion - Evidence-based conclusion
assertions:
assertion-john-born:
subject:
event: event-birth-john
property: date
value: "1850-01-15"
citations: [citation-john-birth]
confidence: high
# 5. Event - Birth event with concluded date
events:
event-birth-john:
type: birth
date: "1850-01-15"
place: place-leeds
participants:
- person: person-john-smith
role: subjectMultiple Citations and Corroboration
Assertions can reference multiple citations, showing corroboration from independent sources:
assertions:
assertion-smith-occupation:
subject:
person: person-john-smith
property: occupation
value: "blacksmith"
citations:
- citation-1851-census
- citation-trade-directory
- citation-parish-record
confidence: highResearch Notes
Use the notes field to document research decisions, conflicting evidence, and uncertainties:
assertions:
assertion-disputed-birth:
subject:
event: event-birth-john
property: date
value: "1850-01-15"
confidence: medium
notes: |
Two conflicting sources:
- Birth certificate: January 15, 1850 (preferred, higher quality)
- Baptism record: January 20, 1850 (5-day delay common)
Certificate takes precedence as primary direct evidence.
More research needed on baptism delay practices.
citations:
- citation-birth-cert
- citation-baptism-recordCollaboration
Version Control Ready
GENEALOGIX is designed from the ground up for Git version control:
- File-per-entity structure: Each entity in a separate file enables clean, focused diffs
- YAML format: Human-readable, merge-friendly, and Git-optimized
- Entity-level granularity: Changes to one person don't affect other files
- Branch-based research: Isolate hypotheses and investigations in branches
- Collaborative workflows: Multiple researchers can work simultaneously with standard Git merge tools
- Complete audit trail: Git tracks every change, who made it, and when
Change Tracking with Git
Since GENEALOGIX archives are Git repositories, all changes are automatically tracked:
# See complete change history
git log --oneline -- persons/person-john-smith.glx
# See who made what changes
git blame persons/person-john-smith.glx
# Track research progress over time
git log --since="2024-01-01" --until="2024-03-31"Git provides automatic provenance tracking for all research work, showing when conclusions were drawn and how they evolved over time.
Next Steps
Now that you understand the core concepts and architecture, the next step is understanding how to organize your archive files. See Archive Organization for details on file formats, directory structures, and organization strategies.