A new study from Washington State University challenges the reliability of artificial intelligence in scientific reasoning. Researchers tested ChatGPT against hundreds of hypotheses and found significant gaps between perceived and actual accuracy. The findings, published on March 17, 2026, suggest caution for businesses relying on generative models for critical data. Global markets face potential risks. Economic stability depends on accurate information.
The research team evaluated more than 700 hypotheses drawn from business journals published since 2021. Each question received 10 identical prompts to measure consistency across multiple interactions. This rigorous testing method highlighted flaws that surface-level accuracy metrics often obscure. Data integrity is crucial for modern operations.
Initial results showed an 80% success rate on the surface during the 2025 follow-up test. However, adjusting for random guessing revealed the system performed only 60% better than chance. This level of reliability resembles a low D grade rather than strong analytical capability. Statistical noise significantly impacts outcomes.
Inconsistency emerged as a critical flaw when the AI answered the same question repeatedly. Researchers observed instances where the model flipped between true and false answers five times each. Such volatility undermines trust in automated decision-making tools for high-stakes environments. Reliability is paramount for financial sectors.
Mesut Cicek, an associate professor at WSU, emphasized the distinction between fluency and understanding. He stated that current tools do not possess a brain and merely memorize patterns. The lack of conceptual understanding poses risks for sectors requiring nuanced judgment. Human oversight remains necessary.
Performance remained similar between the ChatGPT-3.5 version tested in 2024 and the ChatGPT-5 mini in 2025. Despite updates, the core reasoning limitations persisted across both iterations. This suggests fundamental architectural challenges rather than temporary software bugs. Hardware constraints also play a role.
The Rutgers Business Review published these findings to urge skepticism among business leaders. Experts recommend verifying AI-generated information before integrating it into strategic plans. Training programs should focus on understanding system limitations alongside capabilities. Corporate governance must adapt quickly.
A 2024 national survey indicated consumers hesitate to buy products marketed heavily with AI claims. This study adds empirical weight to concerns about overhyped technological promises. Market confidence may suffer if reliance on flawed data becomes widespread. Investor sentiment is sensitive to truth.
Similar experiments with other AI tools have produced comparable outcomes according to Cicek. The work builds on earlier research pointing to caution around artificial intelligence hype. Industry standards may need revision to account for these reasoning deficits. Regulatory frameworks are under review.
Future developments will likely focus on improving consistency rather than just raw accuracy rates. Stakeholders should monitor how regulatory bodies address AI reliability in scientific contexts. Verification protocols remain essential until systems demonstrate genuine comprehension. Long-term investment strategies require caution.