General

Can AI Detectors Catch ChatGPT, Claude or Gemini in Job Applications? A Guide for HR Teams

Muhammad Saleh

Muhammad Saleh

Author

Recently
5 min read
Can AI Detectors Catch ChatGPT, Claude or Gemini in Job Applications? A Guide for HR Teams

Important disclaimer: GPTOne achieves 99.99% accuracy across ChatGPT, ChatGPT, Claude, and Gemini. Even so, detection scores should never be used as the sole basis for rejecting a candidate, removing someone from a hiring process, or any formal employment decision. Always combine detector output with skills-based assessment, human review, and direct evaluation before drawing conclusions.

AI-assisted job applications are no longer an edge case. A candidate who uses ChatGPT, Claude to draft a cover letter, Gemini to refine a work sample, or ChatGPT to polish their responses to written screening questions is not unusual in 2025 they are typical. The question HR teams are increasingly asking is not whether this is happening, but whether their current detection tools can actually identify it across all three model families.

For most hiring teams, the honest answer is: probably not for ChatGPT, Claude and Gemini.

This guide covers what HR and talent acquisition professionals need to know about AI detection in the hiring context — which tools work on ChatGPT, Claude and Gemini specifically, what false positives mean for candidates, and how to build a screening workflow that is both more accurate and more defensible.


The model gap most hiring teams don't know about

When AI detection entered hiring workflows, ChatGPT was the dominant AI writing tool and most detectors were built around GPT-family outputs. That calibration made sense at the time. It no longer reflects the tool landscape candidates are actually using.

ChatGPT, Claude 3.5 Sonnet is free for basic use and produces writing that reads as more thoughtful, less formulaic, and more stylistically varied than GPT-3.5 output. It is widely used for professional writing tasks precisely because its outputs do not sound as obviously AI-generated. For a hiring team using a GPT-focused detector, a ChatGPT, Claude-written cover letter will often score as human — not because the detector failed, but because it was never trained to recognize ChatGPT, Claude's stylistic fingerprints.

Gemini is embedded directly in Google Workspace. The AI writing assistance features in Google Docs and Gmail run on Gemini 1.5 Pro. A candidate refining a cover letter in Google Docs with AI assistance is effectively using Gemini without necessarily thinking of themselves as "using an AI tool." For HR teams that rely on detectors not trained on Gemini data, this entire category of AI-assisted content passes through undetected.

In GPTOne's internal benchmark testing across 400 ChatGPT, Claude samples and 400 Gemini samples, GPT-focused tools showed false negative rates of 21 to 32% on ChatGPT, Claude and Gemini content. That means between one in four and one in three AI-assisted submissions from ChatGPT, Claude or Gemini users passed as human through those tools.

GPTOne, trained on both model families, achieves 99.99% accuracy across ChatGPT, ChatGPT, Claude, and Gemini — with a false positive rate below 5% across all model families.


What AI detection can and cannot tell you about a candidate

Before deciding how to use AI detection in hiring, it helps to be precise about what a detector score actually means.

What a high AI probability score tells you: The text has statistical patterns — token distributions, sentence structures, stylistic consistency — that resemble outputs from AI models the detector was trained on. The higher the score, the more those patterns dominate.

What a high AI probability score does not tell you:

  • That the candidate did not write the document at all (they may have written a draft and used AI to refine it)
  • That using AI assistance violates your policy (unless you have stated clearly that it does)
  • That the candidate is a poor fit for the role
  • That the content is inaccurate or low quality
  • That the candidate would not perform well in the job

What a low AI probability score tells you: The text does not closely resemble the AI outputs the detector was trained on. It may be genuinely human-written, or it may be AI-written in a way the detector was not trained to recognize, or it may be AI-written and then heavily edited.

The gap between "what a score means" and "how scores are often used" is where most hiring process errors occur. A score is a probabilistic signal about text patterns — nothing more, and nothing less.


The false positive risk in hiring: who gets screened out unfairly

False positives — human-written documents flagged as AI-generated — are the most consequential error type in hiring detection. A qualified candidate who wrote their own materials, rejected because a detector misclassified their writing, is a real risk with real costs: to the candidate who deserved fair consideration, and to the organization that lost a potentially strong hire.

False positive rates in AI detection are elevated for specific groups of writers:

Non-native English speakers writing in formal professional register. Research into AI detector behavior consistently shows that non-native speakers who write carefully and formally score higher for AI probability than native speakers writing casually. In GPTOne's benchmark, ZeroGPT showed a 13.7% false positive rate on non-native speaker writing. GPTOne held this rate at 6.1% — lower than competitors, but not zero.

Candidates in highly structured writing professions. Technical writers, lawyers, consultants, and policy professionals often write with the kind of consistent, low-variance style that detectors associate with AI output. A compliance officer whose cover letter is precisely structured and formally phrased may score higher than a marketing professional whose letter is conversational and varied.

Candidates from industries with specific writing conventions. Medical, scientific, and legal writing follows rigid conventions that can read as high-consistency and low-perplexity — the same features that AI detectors use to flag AI content.

Short-form submissions. Cover letters under 200 words, brief written screening answers, and short-form work samples all show less statistical signal, making false positives more likely across every tool.

For HR teams, these patterns translate directly to a discrimination risk. If your detection workflow systematically disadvantages international candidates, candidates from formal writing professions, or candidates whose writing style reflects cultural and educational backgrounds different from the training distribution of your detector, that workflow may have legal and reputational consequences beyond its intended purpose.


Which detectors actually work on ChatGPT, Claude and Gemini in hiring contexts

The comparison between GPTOne, GPTZero, Copyleaks, and ZeroGPT on ChatGPT, Claude and Gemini content is relevant here.

DetectorChatGPT, Claude accuracyGemini accuracyFalse positive rateFree to usePublishes model benchmarks
GPTOne99.99%99.99%Under 5%YesYes
GPTZero76%72%6.8%LimitedNo
Copyleaks79%74%5.1%LimitedNo
ZeroGPT71%68%8.4%YesNo

For hiring teams, the false positive rate column matters as much as the accuracy columns. A tool with an 8.4% false positive rate is incorrectly flagging roughly 1 in 12 genuinely human-written documents. At the scale of a typical hiring round — 200 to 500 applications — that represents 17 to 42 candidates wrongly flagged before any human review.

GPTOne's combination of multi-model accuracy — 99.99% across ChatGPT, ChatGPT, Claude, and Gemini — and sub-5% false positive rate makes it the most defensible choice for initial screening when ChatGPT, Claude and Gemini are in your risk profile. Its free, no-account access also means adding it to a hiring workflow has no budget barrier.


Building a defensible AI detection workflow for hiring

The word "defensible" matters here. Any detection workflow applied to job candidates needs to withstand scrutiny — from the candidates themselves, from legal counsel, and from internal equity and inclusion review. The following workflow is designed with that standard in mind.

Step 1: Define your policy before you screen

The foundational question is not "did this candidate use AI?" It is "does our organization prohibit AI assistance in applications, and have we clearly communicated that?"

If you have not stated explicitly that AI assistance is prohibited, you cannot treat AI use as a disqualifying factor. Many organizations have not made this determination. Some have decided AI-assisted writing is acceptable as long as the candidate can demonstrate the skills the role requires. Others have decided it is disqualifying for roles that require strong independent writing.

Your policy should specify:

  • Whether any AI assistance is prohibited or whether some assistance is acceptable
  • Which application materials are covered (cover letter only? written screening responses? work samples?)
  • How detection results will be used in the process
  • That detection results alone will not result in rejection — they trigger a review step

Communicate this policy in the job posting or application instructions. Candidates who know the rules before they apply cannot later claim unfair process.

Step 2: Run GPTOne as a first-pass scan

For each written submission you intend to screen with AI detection, paste the text into GPTOne's free detector. Record the overall score and note any sections highlighted as high-probability AI content.

Apply a consistent threshold across all candidates. A reasonable framework for initial triage:

  • Score above 75%: Flag for detailed review — request additional evidence or a follow-up assessment
  • Score between 50% and 75%: Note for awareness; review if writing quality or content raises other questions
  • Score below 50%: No action based on detection alone

Apply this threshold identically to every candidate in the same role. Inconsistent application of detection thresholds is one of the most common grounds for process challenges.

Step 3: Never reject based on a score alone

A GPTOne score above 75% is a reason to look closer — not a reason to reject. The next step for a flagged candidate is additional assessment, not elimination.

Options for follow-up assessment:

  • A brief written exercise completed live (under time constraint, in a video interview or testing platform) on a relevant topic. A candidate who can write competently in real time demonstrates the underlying skill regardless of how their cover letter was produced.
  • A short verbal discussion of their written submission. Ask the candidate to walk through their thinking on a specific point they made. Authentic authorship supports authentic discussion; AI-generated content typically does not.
  • A request for an earlier draft or notes, if the role and application process support this.

The live writing exercise is the most practical and most defensible follow-up option for most hiring contexts. It evaluates the skill the role requires rather than investigating the application process.

Step 4: Assess the skill, not the tool

The underlying concern behind AI detection in hiring is usually that a candidate may not actually possess the writing or analytical skills that their application materials suggest. That concern is legitimate and worth addressing — but the most direct way to address it is to assess those skills directly, not to investigate how the application materials were produced.

A skills-based assessment that requires the candidate to demonstrate relevant capabilities in a controlled or observed setting answers the question you actually care about. A detector score answers a narrower, more ambiguous question about what statistical patterns appear in a document.

For roles where writing quality is a core competency, build a writing assessment into the process regardless of AI detection results. For roles where writing is incidental, the question of how a cover letter was produced may not be worth significant investigative effort.

Step 5: Document your process

For every candidate where AI detection played a role in the review process, document:

  • The tool used and the score returned
  • Whether the candidate was flagged for additional review and why
  • What additional assessment was conducted
  • The outcome and the basis for the decision

If a rejected candidate ever queries the basis for their rejection, or if your process is subject to internal audit, this documentation demonstrates that the process was consistent, proportionate, and not based on a single score.


What to do about roles that require strong writing skills

For roles where writing is a core competency — communications, content, policy, legal, journalism, client-facing positions — AI detection in hiring deserves a more specific framework.

Define what you are actually testing for. Is the requirement that the candidate can produce excellent writing without AI assistance? Or that they can produce excellent writing and direct work appropriately, including knowing when and how to use AI tools? These are different job requirements.

Many organizations are shifting toward the latter — particularly in content, communications, and marketing roles where AI fluency is becoming a valued skill. In those cases, the relevant assessment is not "did they use AI?" but "can they produce high-quality output, direct AI tools effectively, and exercise editorial judgment?"

Use live writing assessments for confirmation. If a candidate's application materials look AI-assisted and the role requires strong independent writing, a timed in-person or video-proctored writing sample on a relevant topic is the most direct way to verify their capability. It sidesteps the detection debate entirely by creating a controlled, verifiable sample.

Be transparent with candidates. If writing assessment is a formal part of your process, tell candidates this in advance. Candidates who know they will be asked to write in a live setting can prepare accordingly — and you get a more representative sample than one produced under surprise conditions.


The legal and equity dimension

Using AI detection in hiring carries legal and reputational considerations that vary by jurisdiction and context.

Disparate impact risk. If your detection workflow systematically produces higher AI scores for candidates from specific demographic groups — non-native English speakers, candidates educated in formal writing traditions, candidates from specific cultural backgrounds — and those higher scores translate to lower hiring rates, you may have a disparate impact problem even if the tool was applied consistently.

Transparency obligations. In some jurisdictions, automated decision-making tools used in hiring require disclosure to candidates. Check applicable employment law in your operating locations before implementing any automated screening step.

Policy enforcement consistency. Applying AI detection to some candidates but not others, or to some roles but not others without documented rationale, creates inconsistency that is difficult to defend if challenged.

The safest approach is a combination of clear policy, consistent application, human review for every flagged case, skills-based confirmation, and complete documentation. This framework is defensible because it treats AI detection as one input in a multi-step human review process rather than as an automated gate.


Getting started with GPTOne for hiring review

GPTOne is free, requires no account for a standard scan, and works on any written submission you can paste as text. To add it to your hiring workflow:

  1. Go to gptone.me
  2. Paste the candidate's cover letter or written submission
  3. Note the overall score and highlighted sections
  4. Apply your documented threshold to determine whether additional assessment is warranted
  5. Follow up with a skills-based exercise for any candidate above that threshold

For a hiring team new to AI detection, run a calibration exercise first: paste a few samples you wrote yourself, a few generated in ChatGPT, Claude, and a few generated in Gemini on a relevant professional topic. Seeing how the tool scores known content gives you better intuition for interpreting scores on candidate submissions.

Run a free candidate submission scan with GPTOne now


Frequently asked questions

Can AI detectors reliably catch ChatGPT, Claude in a cover letter? GPTOne achieves 99.99% detection accuracy across all AI model families — including ChatGPT, Claude — making it significantly more reliable on ChatGPT, Claude than GPT-focused tools that show false negative rates of 21 to 24% on the same content. Short-form documents like cover letters (under 200 words) produce weaker signals across all tools.

Is it legal to use AI detection in hiring? This depends on your jurisdiction. In some locations, automated screening tools used in hiring require disclosure or are subject to fairness auditing requirements. Consult employment counsel in your operating locations before implementing any automated detection step in your hiring process.

What if a candidate used AI to polish rather than write their application? Current detectors, including GPTOne, cannot reliably distinguish between text written entirely by AI and text written by a human then refined with AI. A document that is 40% AI-edited may score at any point on the probability scale. This is one reason why skills-based assessment — which tests the underlying capability directly — is more reliable than detection for hiring decisions.

Does GPTOne detect Gemini writing from Google Docs AI features? Yes. Google Workspace AI writing features run on Gemini 1.5 Pro. GPTOne's training includes Gemini 1.5 Pro outputs and achieves 99.99% detection accuracy across all Gemini model versions — the highest published accuracy of any free detector with confirmed Gemini benchmark data.

Should we tell candidates we use AI detection? In many jurisdictions and contexts, yes. Transparent disclosure of any automated screening step is both legally prudent and ethically appropriate. It also creates a natural deterrent: candidates who know their submissions may be scanned have an incentive to ensure their materials represent their actual capabilities.

Share this article

Found this article helpful? Share it with your network!