ChatGPT struggles to interpret scientific data in doctoral-level tests

Researchers at Harvard Medical School have identified a clear vulnerability in generative artificial intelligence: its inability to consistently interpret scientific data and graphs. In a study published this month in PLOS One, the team tested ChatGPT against doctoral students in a molecular biology course, finding that the AI struggled with tasks requiring high-level data analysis.

While the authors initially hypothesized that the AI would perform well on memorization-based tasks and falter on complex critical thinking, the results were more nuanced. Doctoral students outperformed the AI across the board, but the gap was largely driven by the AI’s poor performance on basic tasks involving data application and recall.

Data analysis as an AI hurdle

The research team evaluated ChatGPT’s performance on take-home assignments designed for graduate-level students. Even when utilizing versions of the software specifically optimized for image interpretation, the AI failed to accurately read or synthesize raw scientific data. This finding suggests that current large language models lack the specialized reasoning required to navigate complex visual information in experimental biology.

"We found a striking deficit in ChatGPT’s ability to interpret scientific graphs and raw data in both short-answer and multiple-choice questions," the authors wrote. They noted that while simple prompt engineering improved some scores, it did not bridge the gap in analytical competence.

The study, led by researchers including A.C. Kwong and J.J. Peters, suggests that educators can design out-of-class assessments that are more resistant to AI misuse by focusing on visual data interpretation. By moving away from purely descriptive questions, professors may be able to ensure that students are actually engaging with the material rather than relying on automated shortcuts.

The researchers argue that these findings provide a roadmap for updating academic curricula. By prioritizing assignments that require the synthesis of raw data, departments can maintain academic rigor even as generative tools become more accessible. Their work was supported by a grant from the Harvard Medical School Dean’s Innovation Awards, which focuses on the integration of AI in education and research.

ChatGPT struggles to interpret scientific data in doctoral-level tests

Data analysis as an AI hurdle

Comments

Keep reading

More from Science

Genetic study finds deep links between mental health and physical illness

Trump Administration Targets Brazil's Pix Payment System

Famous 300-million-year-old octopus fossil debunked by high-tech scan

Latest news

Ten-day mindfulness program boosts sleep quality and heart rate variability

Study finds reproductive trade-offs in free-farrowing pig systems

Innovative housing system shows promise for sow lactation performance