La Era
Apr 9, 2026 · Updated 09:20 AM UTC
Science

ChatGPT struggles to interpret scientific data in doctoral-level tests

A study from Harvard Medical School researchers reveals that while generative AI can mimic memorized knowledge, it fails significantly when asked to analyze raw scientific data and graphs.

Tomás Herrera

2 min read

ChatGPT struggles to interpret scientific data in doctoral-level tests
Photo: logomakerr.ai

Researchers at Harvard Medical School have identified a clear vulnerability in generative artificial intelligence: its inability to consistently interpret scientific data and graphs. In a study published this month in PLOS One, the team tested ChatGPT against doctoral students in a molecular biology course, finding that the AI struggled with tasks requiring high-level data analysis.

While the authors initially hypothesized that the AI would perform well on memorization-based tasks and falter on complex critical thinking, the results were more nuanced. Doctoral students outperformed the AI across the board, but the gap was largely driven by the AI’s poor performance on basic tasks involving data application and recall.

Data analysis as an AI hurdle

The research team evaluated ChatGPT’s performance on take-home assignments designed for graduate-level students. Even when utilizing versions of the software specifically optimized for image interpretation, the AI failed to accurately read or synthesize raw scientific data. This finding suggests that current large language models lack the specialized reasoning required to navigate complex visual information in experimental biology.

"We found a striking deficit in ChatGPT’s ability to interpret scientific graphs and raw data in both short-answer and multiple-choice questions," the authors wrote. They noted that while simple prompt engineering improved some scores, it did not bridge the gap in analytical competence.

The study, led by researchers including A.C. Kwong and J.J. Peters, suggests that educators can design out-of-class assessments that are more resistant to AI misuse by focusing on visual data interpretation. By moving away from purely descriptive questions, professors may be able to ensure that students are actually engaging with the material rather than relying on automated shortcuts.

The researchers argue that these findings provide a roadmap for updating academic curricula. By prioritizing assignments that require the synthesis of raw data, departments can maintain academic rigor even as generative tools become more accessible. Their work was supported by a grant from the Harvard Medical School Dean’s Innovation Awards, which focuses on the integration of AI in education and research.

Comments

Comments are stored locally in your browser.