This study addresses a critical gap in existing research by systematically comparing the performance of five popular large language models (LLMs) in supporting high-quality qualitative research. Our methodology combines a literature review of academic papers from 2020 to 2025 with a proof-of-concept experiment evaluating ScholarAI, ChatGPT-4o, Claude 3.5 Sonnet, NotebookLM and Perplexity on key qualitative analysis tasks. We sought to determine how well these generative artificial intelligence (AI) models meet established standards of methodological rigor in qualitative analysis. Findings reveal significant variation in LLM performance: the models excelled at efficiently retrieving relevant literature, summarizing content and generating insights, but exhibited inconsistencies in contextual comprehension, coding accuracy and depth of critical analysis. These results informed a novel evaluation framework aligning LLM outputs with qualitative research quality criteria, contributing guidance for researchers and practitioners. We recommend that practitioners leverage LLMs to improve productivity while exercising critical oversight of their outputs, and that researchers address ethical concerns and refine evaluation rubrics to ensure responsible AI integration. Overall, this work establishes a foundation for responsible human–AI collaboration in qualitative research by highlighting both the opportunities and challenges of using generative AI to enhance methodological rigor and accessibility.

PAGES
45 – 58
DOI
All content is freely available without charge to users or their institutions. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles in this journal without asking prior permission of the publisher or the author. Articles published in the journal are distributed under a http://creativecommons.org/licenses/by/4.0/.
Issues
Also in this issue:
-
Automated plagiarism
-
‘Foreignize yourself’. What has translation to do with innovation? A translation studies approach to hybrid innovation
-
From tools to symbols: exploring the complex nexus of smartphones in Bangladesh
-
Impoverishing peer review
-
Do AIs have politics? Thinking about ChatGPT through the work of Langdon Winner
Generative artificial intelligence in qualitative analysis: a critical examination of tools, trust and rigor
Paper