{"id":130006,"date":"2025-05-26T15:15:59","date_gmt":"2025-05-26T15:15:59","guid":{"rendered":"http:\/\/cryptospotters.net\/?p=130006"},"modified":"2025-05-26T15:15:59","modified_gmt":"2025-05-26T15:15:59","slug":"ai-needs-better-human-data-not-bigger-models","status":"publish","type":"post","link":"http:\/\/cryptospotters.net\/?p=130006","title":{"rendered":"AI needs better human data, not bigger models"},"content":{"rendered":"<p>Source: Cointelegraph.com NewsOpinion by: Rowan Stone, CEO at Sapien<br \/>\nAI is a paper tiger without human expertise in data management and training practices. Despite massive growth projections, AI innovations won\u2019t be relevant if they continue training models based on poor-quality data.\u00a0<br \/>\nBesides improving data standards, AI models need human intervention for contextual understanding and critical thinking to ensure ethical AI development and correct output generation.<br \/>\nAI has a \u201cbad data\u201d problem<br \/>\nHumans have nuanced awareness. They draw on their experiences to make inferences and logical decisions. AI models are, however, only as good as their training data.<br \/>\nAn AI model\u2019s accuracy doesn\u2019t entirely depend on the underlying algorithms\u2019 technical sophistication or the amount of data processed. Instead, accurate AI performance depends on trustworthy, high-quality data during training and analytical performance tests.<br \/>\nBad data has multifold ramifications for training AI models: It generates prejudiced output and hallucinations from faulty logic, leading to lost time in retraining AI models to unlearn bad habits, thereby increasing company costs.<br \/>\nBiased and statistically underrepresented data disproportionately amplifies flaws and skewed outcomes in AI systems, especially in healthcare and security surveillance.<br \/>\nFor example, an Innocence Project report lists multiple cases of misidentification, with a former Detroit police chief admitting that relying solely on AI-based facial recognition would lead to 96% misidentifications. Moreover, according to a Harvard Medical School report, an AI model used across US health systems prioritized healthier white patients over sicker black patients.\u00a0<br \/>\nAI models follow the \u201cGarbage In, Garbage Out\u201d (GIGO) concept, as flawed and biased data inputs, or \u201cgarbage,\u201d generate poor-quality outputs. Bad input data creates operational inefficiencies as project teams face delays and higher costs in cleaning data sets before resuming model training.<br \/>\nBeyond their operational effect, AI models trained on low-quality data erode the trust and confidence of companies in deploying them, causing irreparable reputational damage. According to a research paper, hallucination rates for GPT-3.5 were at 39.6%, stressing the need for additional validation by researchers.<br \/>\nSuch reputational damages have far-reaching consequences because it becomes difficult to get investments and affects the model\u2019s market positioning. In a CIO Network Summit, 21% of America\u2019s top IT leaders expressed a lack of reliability as the most pressing concern for not using AI.<br \/>\nPoor data for training AI models devalues projects and causes enormous economic losses to companies. On average, incomplete and low-quality AI training data results in misinformed decision-making that costs companies 6% of their annual revenue.<br \/>\nRecent: Cheaper, faster, riskier \u2014 The rise of DeepSeek and its security concerns<br \/>\nPoor-quality training data affects AI innovation and model training, so searching for alternative solutions is essential.<br \/>\nThe bad data problem has forced AI companies to redirect scientists toward preparing data. Almost 67% of data scientists spend their time preparing correct data sets to prevent misinformation delivery from AI models.<br \/>\nAI\/ML models may struggle to keep up with relevant output unless specialists \u2014 real humans with proper credentials \u2014 work to refine them. This demonstrates the need for human experts to guide AI\u2019s development by ensuring high-quality curated data for training AI\u00a0models.<br \/>\nHuman frontier data is key<br \/>\nElon Musk recently said, \u201cThe cumulative sum of human knowledge has been exhausted in AI training.\u201d Nothing could be farther from the truth since human frontier data is the key to driving stronger, more reliable and unbiased AI models.<br \/>\nMusk\u2019s dismissal of human knowledge is a call to use artificially produced synthetic data for fine-tuning AI model training. Unlike humans, however, synthetic data lacks real-world experiences and has historically failed to make ethical judgments.<br \/>\nHuman expertise ensures meticulous data review and validation to maintain an AI model\u2019s consistency, accuracy and reliability. Humans evaluate, assess and interpret a model\u2019s output to identify biases or mistakes and ensure they align with societal values and ethical standards.<br \/>\nMoreover, human intelligence offers unique perspectives during data preparation by bringing contextual reference, common sense and logical reasoning to data interpretation. This helps to resolve ambiguous results, understand nuances, and solve problems for high-complexity AI model training.<br \/>\nThe symbiotic relationship between artificial and human intelligence is crucial to harnessing AI\u2019s potential as a transformative technology without causing societal harm. A collaborative approach between man and machine helps unlock human intuition and creativity to build new AI algorithms and architectures for the public good.<br \/>\nDecentralized networks could be the missing piece to finally solidify this relationship at a global scale.<br \/>\nCompanies lose time and resources when they have weak AI models that require constant refinement from staff data scientists and engineers. Using decentralized human intervention, companies can reduce costs and increase efficiency by distributing the evaluation process across a global network of data trainers and contributors.<br \/>\nDecentralized reinforcement learning from human feedback (RLHF) makes AI model training a collaborative venture. Everyday users and domain specialists can contribute to training and receive financial incentives for accurate annotation, labeling, category segmentation and classification.<br \/>\nA blockchain-based decentralized mechanism automates compensation as contributors receive rewards based on quantifiable AI model improvements rather than rigid quotas or benchmarks. Further, decentralized RLHF democratizes data and model training by involving people from diverse backgrounds, reducing structural bias, and enhancing general intelligence.<br \/>\nAccording to a Gartner survey, companies will abandon over 60% of AI projects by 2026 due to the unavailability of AI-ready data. Therefore, human aptitude and competence are crucial for preparing AI training data if the industry wants to contribute $15.7 trillion to the global economy by 2030.<br \/>\nData infrastructure for AI model training requires continuous improvement based on new and emerging data and use cases. Humans can ensure organizations maintain an AI-ready database through constant metadata management, observability and governance.<br \/>\nWithout human supervision, enterprises will fumble with the massive volume of data siloed across cloud and offshore data storage. Companies must adopt a \u201chuman-in-the-loop\u201d approach to fine-tune data sets for building high-quality, performant and relevant AI models.<br \/>\nOpinion by: Rowan Stone, CEO at Sapien.<br \/>\nThis article is for general information purposes and is not intended to be and should not be taken as legal or investment advice. The views, thoughts, and opinions expressed here are the author\u2019s alone and do not necessarily reflect or represent the views and opinions of Cointelegraph.<a href=\"https:\/\/cointelegraph.com\/news\/ai-needs-better-human-data-not-bigger-models?utm_source=rss_feed&amp;utm_medium=rss&amp;utm_campaign=rss_partner_inbound\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Source: Cointelegraph.com NewsOpinion by: Rowan Stone, CEO at Sapien AI is a paper tiger without human expertise in data management and training practices. Despite massive growth projections, AI innovations won\u2019t&hellip; <\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5],"tags":[],"_links":{"self":[{"href":"http:\/\/cryptospotters.net\/index.php?rest_route=\/wp\/v2\/posts\/130006"}],"collection":[{"href":"http:\/\/cryptospotters.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/cryptospotters.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"http:\/\/cryptospotters.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=130006"}],"version-history":[{"count":0,"href":"http:\/\/cryptospotters.net\/index.php?rest_route=\/wp\/v2\/posts\/130006\/revisions"}],"wp:attachment":[{"href":"http:\/\/cryptospotters.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=130006"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/cryptospotters.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=130006"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/cryptospotters.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=130006"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}