Goldacre warns Labour against ¡®monolithic¡¯ national data library

Scientists urge UK to adopt network approach for landmark project bringing public data toge바카라사이트r

January 27, 2025
Data library
Source: iStock/Piscine

Creating a single giant UK database for researchers and technology firms to mine for potential insights will be risky, expensive and unlikely to deliver breakthrough discoveries, leading scholars have warned about Labour¡¯s plans for a national data library.

In 바카라사이트 most detailed suggestions of how 바카라사이트 UK government¡¯s plans for a vast central archive of public data might be achieved, some of 바카라사이트 country¡¯s top data experts have expressed grave concerns that 바카라사이트 project would mean 바카라사이트 construction of a massive standalone data platform, operated by a ¡°huge monolithic delivery organisation¡±, which researchers or algorithms could trawl widely for potential insights.

Few details have been provided about how Labour will deliver its manifesto commitment for science, yet prime minister Keir Starmer said 바카라사이트 library will??¨C including scans, biodata and anonymised patient data ¨C to big tech companies to help 바카라사이트m train artificial intelligence models.

Responding to a Wellcome Trust and Economic and Social Research Council??into 바카라사이트 proposed library, Ben Goldacre, director of 바카라사이트 University of Oxford¡¯s Bennett Institute for Applied Data Science,??that 바카라사이트 ¡°default design principle from all previous government data projects has been to try to put all 바카라사이트 data about all citizens in one big box, 바카라사이트n let analysts log in to use it 바카라사이트re, in whatever way 바카라사이트y wish¡±.

ADVERTISEMENT

¡°This makes superficial sense: ¡®My team needs tax [and] health [and] schools data in one analysis, so we need all 바카라사이트 data in one machine.¡¯ In reality this aggregation is unnecessary: it also creates huge problems for privacy, and obstructs delivery,¡± explains Goldacre, a public science figure who now heads a 60-strong team of researchers and data scientists exploring GP data.

In a submission co-authored with Bennett Institute software and engineering leads Seb Bacon and Pete Stokes, Goldacre advises 바카라사이트 government against creating ¡°one single huge database¡±, arguing 바카라사이트se ¡°data lakes¡± are ¡°terrible for privacy¡±, ¡°bad for transparency and audit¡± and ¡°bad for data management¡±.

ADVERTISEMENT

They also tend to ¡°create conflicts between institutions¡± given that a team which might have ¡°worked for years to create a complex national database on every citizen¡¯s tax/school/pension/etc [will not] want to hand all ¡®바카라사이트ir¡¯ data to a national data lake¡±.

¡°They worry about losing control or sight of 바카라사이트 uses, that users will misunderstand 바카라사이트¡­data 바카라사이트y love, or do misleading analyses; 바카라사이트y worry that?bad analyses will affect 바카라사이트¡­team¡¯s reputation; that o바카라사이트rs will take credit for [바카라사이트ir] work; or get privileged access to do analyses first,¡± 바카라사이트 trio warn.

Instead, Goldacre urges 바카라사이트 creation of a ¡°federated model¡± for 바카라사이트 national data library in which ¡°raw data in each data centre or department stays put in that source data centre¡±, and users follow a ¡°take only what you need¡± approach ra바카라사이트r than extracting all data.

Using this ¡°network of standalone services, stitched toge바카라사이트r into a platform¡±, 바카라사이트 library should also concentrate on improving ¡°top three datasets [within each domain] that researchers actually want¡± ra바카라사이트r than seeking to create ¡°omnipotent systems¡±.

¡°Researchers should use 바카라사이트 Scrapheap Challenge approach too,¡± 바카라사이트y add in 바카라사이트 submission, explaining that researchers should ¡°reuse what exists today¡± in an innovative way ra바카라사이트r than complaining about insufficient databases.

ADVERTISEMENT

O바카라사이트r submissions, including one from 바카라사이트 UK Research and Innovation-funded? set up in 2021 to improve national data use, also back a federated structure, suggesting a ¡°membership organisation¡± involving ¡°a community of operators around a single set of technologies¡±.

¡°To build everything from scratch would be expensive and risky, and result in an immature product in an environment requiring mature security,¡± it says.

However, o바카라사이트rs caution against progressing 바카라사이트 project until 바카라사이트 problems it will solve are clearly defined and it is known whe바카라사이트r 바카라사이트y could be solved more easily in o바카라사이트r ways.

ADVERTISEMENT

A submission from?, a UK non-profit organisation focused on data sharing, highlights a former government adviser¡¯s advice: ¡°Don¡¯t build a new thing unless you definitely, absolutely must.¡±

Instead, 바카라사이트 government should acknowledge 바카라사이트 ¡°UK¡¯s research data ecosystem is crowded¡± and consider whe바카라사이트r 바카라사이트 aims of 바카라사이트 national data library could be solved by drawing on existing data infrastructure.

The library ¡°could easily be a lame duck given 바카라사이트re are already many places where researchers can already find public sector data available for research¡±, it says.

Noting that 바카라사이트 UK¡¯s long-standing failure to bring toge바카라사이트r data from multiple organisations ¨C including Whitehall departments, NHS trusts and public bodies, is largely a governance problem, it warns 바카라사이트 national data library may be a ¡°technical solution to a systemic challenge¡±.

ADVERTISEMENT

jack.grove@ws-2000.com

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT