Enrich spreadsheets

demografix enrich reads a spreadsheet, predicts gender, age, or nationality for each row, and appends prediction columns while preserving every existing column. The output is an enriched file, ready for analysis.

Input formats

enrich reads CSV, TSV, JSON, JSONL, and XLSX. The input format is detected from the file extension.

Enrich a file

Point enrich at a file, choose the services, and write the result with -o:

demografix enrich people.csv -o out.csv --gender --age --nationality --name-col full_name

Each enabled service appends its own columns. --gender adds gender, gender_count, and gender_probability. --age adds age and age_count. --nationality adds ranked country candidates and nationality_count.

Choose the name column

enrich needs to know which column holds the name. Name it with --name-col, using a header name or a 1-based column index:

demografix enrich people.csv -o out.csv --gender --name-col 2

When first and last names are in separate columns, use split mode:

demografix enrich people.csv -o out.csv --gender \
  --first-name-col first --last-name-col last

Scope by country

Gender and age accept a country. Apply one country to every row with --country, or read it per row from a column with --country-col:

demografix enrich people.csv -o out.csv --gender --age --name-col full_name --country-col country

Nationality candidates

Nationality returns several country candidates per name, ordered by descending probability. Set how many are appended with --top-n (1 to 5, default 3):

demografix enrich people.csv -o out.csv --nationality --name-col last_name --top-n 5

This appends country_1 through country_5, a matching country_N_probability for each, and nationality_count.

Avoid column collisions

A fresh run refuses to overwrite an existing column. If a prediction column name already exists in the input, add a prefix with --prefix:

demografix enrich people.csv -o out.csv --gender --name-col full_name --prefix pred_

Output

For enrich, -o is an output file path, not a format. The output format follows the file extension — .csv, .tsv, .json, .jsonl, or .xlsx.

Preview the cost

--dry-run validates the configuration and prints a plan — input rows, appended columns, and the number of names that would be billed — without calling the API:

demografix enrich people.csv -o out.csv --gender --age --name-col full_name --dry-run
plan
  input        csv, 1200 rows
  output       out.csv
  services     gender, age
  name column  full_name
  country      none (global)
  new columns  gender, gender_count, gender_probability, age, age_count
  cost         2400 names — no API calls made

Each enabled service bills one name per non-empty row.

Resume after an interruption

A failed request does not abort the run. Sibling rows still process, and the partial output is written. If the quota runs out mid-run, enrich reports how many rows it wrote.

Re-run with --resume to fill only the rows whose prediction columns are still empty:

demografix enrich out.csv -o out.csv --gender --name-col full_name --resume

Summarize a signup file

Enrich a file of signups, then aggregate the result:

demografix enrich signups.csv -o signups_enriched.csv \
  --gender --age --nationality --name-col full_name

The output adds gender, age, and nationality columns to every row. Pivot it in a spreadsheet or notebook to report the demographic mix of the cohort — the gender split, the age distribution, and the leading nationalities.

To predict a few names without a file, or to pipe a list of names, see Basic usage.