What's the Real Quality of Job Listings? We Scored 1,000 to Find Out
We built a scoring engine, tested it against two frontier LLMs, and the results matched. Here's the full breakdown.
1,000
Jobs validated
3
Independent models
51
Detection signals
0.904
Correlation w/ Llama
The Ghost Job Problem
If you've been job hunting recently, you've probably felt it: applications that disappear into a void, listings that never seem to close, and roles that feel like they were filled before you even clicked “Apply.”
These are ghost jobs — listings that exist on career pages but don't represent real, active hiring. Some are “talent pipeline” postings kept open to collect resumes. Some are compliance filings for roles already filled internally. And a lot of them are simply stale listings that nobody bothered to take down.
The cost to job seekers is real: hours spent tailoring resumes, the emotional weight of silence, and a distorted picture of what the market actually looks like. Everyone talks about this problem. We wanted actual numbers. So we went and got them.
Methodology: A 51-Signal Detection Engine
I didn't want to build another black box. Every score traces back to specific, measurable signals — no LLM calls in production, no vibes. The V3 engine evaluates 51 signals across five categories:
Temporal Signals
Listing age, repost frequency, first-seen date, description change history, observation count over time.
Transparency Signals
Salary disclosure, pay range width, job level clarity, team/department named, hiring manager identified.
Employer Context Signals
ATS provider patterns, posting volume consistency, role-to-company-size ratio, description reuse across listings.
Structural Signals
Description length and quality, application method (easy apply vs. ATS), duplicate detection across boards, location specificity.
Each signal contributes to a quality score from 0 to 100, where higher means a healthier, more legitimate listing. The engine uses a flat weighted average with null-skipping — signals that don't apply to a listing are excluded rather than penalized, eliminating sector bias.
Validation: Does It Actually Work?
If we're going to flag a third of job listings as suspicious, the engine needs to be solid. To test it, we ran a three-way comparison against two independent AI models:
- Llama 3.3 70B (Meta) — A frontier open-source model
- Gemini 2.5 Flash (Google) — A fast, high-quality commercial model
Both models were given the same job listing data and asked to independently estimate ghost probability on the same 0-100 scale. Neither had access to our signals or scoring logic — they used their own reasoning.
98.2%
Within 30pp of Llama
0.904
Correlation with Llama
18
Major disagreements / 1,000
Key Findings
1. We're conservative on purpose
The V3 engine shifts the distribution toward lower scores compared to the LLMs, reducing false positives. With a mean ghost score of 38% (vs Llama's 53%), the engine flags fewer listings as high-risk, focusing on the most confident detections. When the engine says “Skip,” you can trust it.
| Metric | Subspace V3 | Llama 3.3 | Gemini 2.5 |
|---|---|---|---|
| Mean | 38% | 53% | 59% |
| Median | 35% | 58% | 65% |
| Std Dev | 13 | 20.2 | 18 |
2. Old listings are almost always ghosts
The older a listing, the more likely it's a ghost. Listings under 7 days old averaged a GPS of 39, while those open 90+ days averaged 67. The inflection point is around 30 days — after that, ghost risk accelerates sharply.
| Listing Age | Count | Avg GPS | Avg Llama |
|---|---|---|---|
| 0–7 days | 75 | 39 | 27 |
| 8–14 days | 99 | 40 | 28 |
| 15–30 days | 186 | 41 | 40 |
| 31–60 days | 182 | 48 | 52 |
| 61–90 days | 114 | 59 | 61 |
| 90+ days | 337 | 67 | 72 |
3. Salary transparency correlates with real jobs
Listings that disclose salary ranges score significantly lower on the ghost scale. Makes sense: a company that's serious about filling a role has already figured out compensation. Ghost postings — especially compliance and pipeline postings — frequently omit salary because there's no real budget approved.
4. Ghost job patterns differ by ATS provider
Different applicant tracking systems serve different market segments, and ghost patterns vary accordingly. Greenhouse and Lever (popular with startups and growth-stage companies) tend to have lower ghost rates than enterprise ATS platforms, likely because smaller companies have less bureaucratic incentive to keep phantom postings alive.
5. Reposting is a major ghost indicator
Jobs that get taken down and reposted — sometimes with minor description changes — are a strong ghost signal. Our temporal tracking system monitors description hashes over time, catching listings that cycle through apparent “freshness” to game job board ranking algorithms.
How to Spot Ghost Jobs Yourself
Here's what actually works if you want to check manually:
- Check the posting date. If a listing has been open for 60+ days and it's not a highly specialized role, be cautious. Our data shows a sharp risk increase after 30 days.
- Look for salary information. Listings with disclosed salary ranges are more likely to represent genuine hiring intent.
- Search for the same role. If you find the identical job posted multiple times across different dates or boards, that's a repost pattern — a classic ghost indicator.
- Research the company. If the company recently announced layoffs or a hiring freeze but still has dozens of open postings, those listings aren't real.
- Check the description quality. Vague, boilerplate descriptions with no specific team, project, or technology mentioned often signal a pipeline posting rather than a real opening.
Methodology Details
Dataset: 1,000 stratified job listings sampled from our actively tracked pool across Greenhouse, Lever, Ashby, SmartRecruiters, Workable, and more. The stratification ensured representation across listing age buckets, ATS providers, company sizes, and geographic regions.
Scoring engine: V3 — 51 signals feed into a flat weighted average with null-skipping, producing 6 nutrition categories (freshness, authenticity, listing completeness, employer quality, compensation, role substance) and a final quality score. No machine learning or LLM calls in production scoring — the engine is fully deterministic and auditable.
Validation protocol: Three-way blind comparison. Our V3 engine, Llama 3.3 70B (via OpenRouter), and Gemini 2.5 Flash each independently scored all 1,000 listings. Agreement was measured at the tier level (within 15pp = agree, 16-30pp = minor disagree, 30+pp = major disagree).
Key validation metrics: Pearson correlation of 0.904 with Llama (up from 0.789 in the previous version), 0.877 with Gemini (up from 0.655). 98.2% agreement within 30pp. Full validation report available at thesubspace.io/research.
Check any job listing for free
Paste any job URL. 51 signals. 3 seconds. No signup.