Justin's Linklog
The following are the titles of recent articles syndicated from Justin's Linklog
Add this feed to your friends list for news aggregation, or view this feed's syndication information.

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.

[ << Previous 20 ]
Thursday, December 18th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:48 am
_Cheap science, real harm: the cost of replacing human participation with synthetic data_ [pdf]
  • _Cheap science, real harm: the cost of replacing human participation with synthetic data_ [pdf]

    A new paper from the inimitable Abeba Birhane, on the increasingly common practice of generating synthetic data using LLMs:

    Driven by the goals of augmenting diversity, increasing speed, reducing cost, the use of synthetic data as a replacement for human participants is gaining traction in AI research and product development. This talk critically examines the claim that synthetic data can “augment diversity,” arguing that this notion is empirically unsubstantiated, conceptually flawed, and epistemically harmful. While speed and cost-efficiency may be achievable, they often come at the expense of rigour, insight, and robust science. Drawing on research from dataset audits, model evaluations, Black feminist scholarship, and complexity science, I argue that replacing human participants with synthetic data risks producing both real-world and epistemic harms at worst and superficial knowledge and cheap science at best.

    "Synthetic data: stereotypes compressed" is absolutely spot on. This doesn't give insights into human behaviour and beliefs, just into stereotypes. It is increasingly common in social science fields, under the names of "digital twins" and "silicon samples".

    Tags: data surveys abeba-birhane papers ai synthetic-data digital-twins simulation testing social-science silicon-samples

Tuesday, December 16th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:52 am
Boost for artists in AI copyright battle as only 3% back UK active opt-out plan
  • Boost for artists in AI copyright battle as only 3% back UK active opt-out plan

    Wow, this is an absolute bollocking for the Labour plan:

    95% of the more than 10,000 people who had their say over how music, novels, films and other works should be protected [in the UK] from copyright infringements by tech companies called for copyright to be strengthened and a requirement for licensing in all cases or no change to copyright law. By contrast, only 3% of people backed the UK government’s initial preferred tech company-friendly option, which was to require artists and copyright holders to actively opt out of having their material fed into data-hungry AI systems.

    Tags: ai training data copyright law uk uk-politics llms

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:06 am
Chafa: Terminal Graphics for the 21st Century
Monday, December 15th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:00 pm
Avoid UUID Version 4 Primary Keys | Software Engineer, Author, High Performance PostgreSQL for Rails
  • Avoid UUID Version 4 Primary Keys | Software Engineer, Author, High Performance PostgreSQL for Rails

    A well-researched article suggesting that random UUIDs do not make a good primary key for database tables; I would tend to agree (for cases where performance is important).

    • UUID v4s increase latency for lookups, as they can’t take advantage of fast ordered lookups in B-Tree indexes
    • For new databases, don’t use gen_random_uuid() for primary key types, which generates random UUID v4 values
    • UUIDs consume twice the space of bigint
    • UUID v4 values are not meant to be secure per the UUID RFC
    • UUID v4s are random. For good performance, the whole index must be in buffer cache for index scans, which is increasingly unlikely for bigger data.
    • UUID v4s cause more page splits, which increase IO for writes with increased fragmentation, and increased size of WAL logs
    • For non-guessable, obfuscated pseudo-random codes, we can generate those from integers, which could be an alternative to using UUIDs
    • If you must use UUIDs, use time-orderable UUIDs like UUID v7

    Tags: postgres rails databases sql mysql uuids indexing primary-keys keys lookup storage random

Tuesday, December 9th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:04 am
‘Pig Butchering’ Scams May Have Spurred Thailand-Cambodia War
  • 'Pig Butchering' Scams May Have Spurred Thailand-Cambodia War

    Via TJ McIntyre -- indications that the Thailand-Cambodia war is being driven by the "pig butchering" scammer compounds operating in the border area:

    Cambodia’s 2019 census put O’Smach’s population just over 9,850, but that doesn’t include the prison-like, office-dormitory compounds that have appeared here over the past five years, with the capacity to house 10,000 more. Around 50 sites like these now line the Cambodia-Thailand border, designed to house a slice of the trillion-dollar cybercrime industry—primarily teams running investment scams, dubbed “pig butchering” for the way they fatten their targets up; sextortion scams that blackmail victims, including children, by threatening to make sexual images public; scams that impersonate police to gain account access; and fraudulent online gambling sites. Once aimed largely at the Chinese public, these now target victims worldwide and rake in tens of billions of dollars a year in Cambodia alone.

    The compounds evolved from a casino industry that caters mostly to Chinese tourists and Thai day-trippers and has been linked to human trafficking, drug smuggling, and the endangered wildlife trade. From 2016, physical casinos were dwarfed by the online gambling industry (outlawed by Cambodia in 2019), which progressed to illegal sites and outright scams. Operators rent space in casinos and purpose-built compounds controlled by Chinese criminals, Myanmar warlords, and the Cambodian political elite.

    Scam companies rely heavily on forced and trafficked labor from Asia, Africa, and Latin America to chat with targets, pose as romantic interests and employees at fake investment platforms, and persuade them to make deposits. Survivors tell us that torture, rape, and beatings are common. As the fighting raged in July, some trafficking victims reached out for help, saying they were locked in their dorms by their bosses. Videos shot from inside these sites show missiles flying overhead, explosions thundering outside, some workers appearing to break out and run, and damage from shelling in the grounds.

    Tags: scams phishing pig-butchering war grim-meathook-future thailand cambodia scammers

Monday, December 8th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:37 am
Year in Review 2025: Hari Kunzru on AI slop and censorship
  • Year in Review 2025: Hari Kunzru on AI slop and censorship

    Hari Kunzru nails it:

    These days I have a sense of falling from a precipice toward a torrent of algorithmically driven slop. It’s coming, whether we want it or not, and the consequences for our communal life will be devastating.

    It’s now seven years since Steve Bannon outlined his infamous strategy to “flood the zone with shit.” This, he said, was a way to “deal with” the media, whom he saw as the real enemies of MAGA. In practice, it has been a very effective method of censorship. With every important issue of the day, the “zone” of public discourse is immediately filled with a volume of competing narratives, often mendacious or misleading. It’s no longer necessary to suppress information. You just have to make the cost of sorting fact from fiction, in terms of time and effort, too high to pay for the ordinary person, who can’t spend all day online weighing up competing claims about robots or pedophilia or Iran.

    Generative AI now allows the production of disinformation at scale. The kind of influence ops we associate with Cambridge Analytica or the Russian Internet Research Agency can be conducted with unprecedented scope and sophistication: Thousands of fake people — tens of thousands, perhaps hundreds of thousands — making videos, posting in forums, astroturfing entire contexts in which people will live out their political lives. Couple this with the collapse of trust in all kinds of authority, and there is no one even to say what might distinguish “disinformation” from any other kind of data. [...]

    The desire to return to consensus reality is hopelessly nostalgic. Yes, there are still hard limits: The “cloud” is a physical place, scooping out mountains for raw materials and venting heat and carbon dioxide out of gargantuan data centers; political power still grows out of the barrel of a gun. But the layer of the stack in which our subjectivities are formed, the place where our beliefs about the world are shaped, is also a battleground. We must teach ourselves to navigate the torrent that is replacing consensus reality, this turbulent, treacherous mediatized flow. There is no shore to swim back to, but in the new age of magic, when reality is labile and can be recoded by the power of signs, by narrative and memes and vibes and compelling images, art becomes a truly political technology. This is not art as critique. Critique is just sincere-posting, dutifully pointing out yet again that the Medbed isn’t “real.” Art can mess with our masters in ways we don’t yet fully understand. It makes culture. It is a transmitter of values. It is the lava out of which future realities will congeal.

    Tags: misinformation disinformation facts reality future ai slop hari-kunzru steve-bannon flooding-the-zone-with-shit art media propaganda

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:00 am
Multiplying our way out of division — Matt Godbolt’s blog
  • Multiplying our way out of division — Matt Godbolt’s blog

    A very silly optimisation for the “binary to decimal” conversion problem:

    The compiler has turned division by a constant ten into a multiply and a shift. There’s a magic constant 0xcccccccd and a shift right of 35! Shifting right by 35 is the same as dividing by 235 - what’s going on? [..]

    What’s happening is that 0xcccccccd / 2**35 is very close to ? (around 0.10000000000582077). By multiplying our input value by this constant first, then shifting right, we’re doing fixed-point multiplication by ? - which is division by ten. The compiler knows that for all possible unsigned integer values, this trick will always give the right answer.

    Tags: hacks optimization bit-hacking binary decimal fixed-point arithmetric tricks

Thursday, December 4th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
6:26 pm
Large Language Models As The Tales That Are Sung
  • Large Language Models As The Tales That Are Sung

    A thought-provoking read on LLMs, poetry, the oral tradition, and Gene Wolfe:

    "Even if LLMs are made out of poetry, they are incapable of producing poems. Or in Wolfe’s language, both the epic form and LLMs are story, but are incapable of telling stories. That requires the marriage of structure and intention that human mediation provides. LLMs are a kind of composite of the singing of tales, but are not singers, even if we sometimes misconstrue them as such."

    Tags: llms text poetry words language gene-wolfe ascians storytelling structure culture

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
1:08 pm
‘Unauthorized’ Edit to Ukraine’s Frontline Maps Point to Polymarket’s War Betting
  • 'Unauthorized' Edit to Ukraine's Frontline Maps Point to Polymarket's War Betting

    A live map that tracks frontlines of the war in Ukraine was edited to show a fake Russian advance on the city of Myrnohrad on November 15. The edit coincided with the resolution of a bet on Polymarket, a site where users can bet on anything from basketball games to presidential election and ongoing conflicts. If Russia captured Myrnohrad by the middle of November, then some gamblers would make money. According to the map that Polymarket relies on, they secured the town just before 10:48 UTC on November 15. The bet resolved and then, mysteriously, the map was edited again and the Russian advance vanished.

    Tags: polymarket betting war future cyberpunk fraud ukraine russia

Wednesday, December 3rd, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
3:19 pm
WallBonito
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:56 pm
Building a Medallion architecture with ClickHouse
  • Building a Medallion architecture with ClickHouse

    Walkthrough of the "Medallion" architecture concept, which comprises three layers (or stages), each serving distinct purposes in the data pipeline:

    • Bronze layer - This layer acts as the landing area for raw, unprocessed data directly from the source system: simply put a "staging area". This data is stored in its original structure with minimal transformations and additional metadata. This layer is optimized for fast ingestion, and can provide an historical archive of source data that is always available for reprocessing or debugging. Whether the bronze layer should store all data is a point of contention, with some users preferring to filter the data and apply transformations, e.g., flattening JSON, renaming fields, or filtering out poorly formed data. We're not overly opinionated here but recommend optimizing the storage for consumption by the silver layer only - not other consumers.

    • Silver layer - Here, data is cleansed, deduplicated, and conformed to a unified schema, with raw data from the previous Bronze layer being enriched and transformed to provide a more accurate and consistent view. This data can be consistent and usable for enterprise-wide use cases such as machine learning and analytics. The data model should emerge at this layer with a focus placed on ensuring primary and foreign keys are consistent to simplify future joins. While not common, applications and downstream consumers can read from this layer. These are typically business-wide applications that need the entire cleansed dataset, e.g., ML workflows. Importantly, data quality will not improve after this stage only the ease at which it can be queried efficiently.

    • Gold layer - This later aims to have fully curated, business-ready, and project-specific datasets that make the data more accessible (and performant) to consumers. These datasets are often denormalized, or pre-aggregated, for optimal read performance and may have been composed of multiple tables from the previous silver stage. The focus here is on applying final transformations and ensuring the highest data quality for consumption by end-users or applications, such as reporting and user-facing dashboards.

    This layered approach to data pipelines aims to efficiently address challenges like data quality, duplication and schema inconsistencies. By transforming raw data incrementally, the Medallion architecture aims to ensure a clear lineage and progressively refined datasets that are ready for analysis or operational use.

    Tags: medallion-architecture data architecture pipelines clickhouse

Friday, November 28th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
1:04 pm
PocketBase
Thursday, November 27th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:23 am
Questioning an Interface: From Parquet to Vortex
  • Questioning an Interface: From Parquet to Vortex

    Interesting -- a new, GPU-optimised storage format:

    Like Parquet, Vortex minimizes bytes on disk. However, Vortex is also designed with a core use-case in mind: decoding and querying data directly from object storage on GPUs. This key idea translates very well to our use-case even though we don’t run our queries on GPUs (yet?). Specifically, the file format is designed to maximize throughput and parallelism from the metadata format to the SIMD/SIMT friendly encodings used.

    Crucially, it also acknowledges that part of making queries fast is not only good filter pushdown, but also general-purpose compute pushdown. If anything cannot be pushed down, Vortex’s encodings can be tuned to offer zero-copy conversion to Arrow for further query execution using any general-purpose query execution engine.

    Vortex also learns from Parquet’s limitations around extensibility and aims to be as future-proof as possible. New encodings can ship with WASM decoders so encoding adoption is not limited by reader libraries having to implement support. The main Rust library is also designed to be fully extensible, so you can write your own layouts/encodings and plug them in as first-class citizens.

    Given how well Vortex’s design matched our needs, we tried it out and got a 70% average performance improvement on all our queries. With the newer encodings that Vortex offers, we got 10% better uncompressed storage size and only 3% larger compressed storage size compared to snappy-compressed Parquet.

    Tags: gpu vortex parquet compression storage file-formats files pushdown simd

Wednesday, November 26th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:57 am
The shameful attacks on the Covid inquiry prove it: the right is lost in anti-science delusion
  • The shameful attacks on the Covid inquiry prove it: the right is lost in anti-science delusion

    Polly Toynbee in the Guardian writes, "The shameful attacks on the Covid inquiry prove it: the right is lost in anti-science delusion":

    That number will stay fixed for ever in public memory: 23,000 people died because Boris Johnson resisted locking the country down in time. As Covid swept in, and with horrific images of Italian temporary morgues in tents, he went on holiday and took no calls. With the NHS bracing to be “overwhelmed” by the virus, he rode his new motorbike, walked his dog and hosted friends at Chevening.

    Nothing is surprising about that: he was ejected from Downing Street and later stepped down as an MP largely for partying and lying to parliament about it. Everyone knew he was a self-aggrandising fantasist with a “toxic and chaotic culture” around him. But this is not just about one narcissistic politician. It’s about his entire rightwing coterie of libertarians and their lethally dominant creed in the UK media.

    I'm glad the science side kept their receipts but I fear this argument will be relitigated indefinitely by anti-lockdown libertarians.

    Tags: lockdowns covid-19 history uk uk-politics medicine health pandemics boris-johnson

Monday, November 24th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
12:41 pm
The developer productivity paradox: Why faster coding doesn’t mean faster software delivery
  • The developer productivity paradox: Why faster coding doesn’t mean faster software delivery

    The paradox is this simple gap: high individual confidence in AI speed, versus stubborn organizational metrics that just won’t budge:

    • Perceived speed is high: Adoption is near-universal (90% usage reported), and confidence is overwhelming (over 80% believe AI has increased their productivity). AI is great at handling cognitive toil and boilerplate, which lets engineers generate bigger code batches and feel genuinely productive.
    • Systemic failure persists: The reality, confirmed by DORA in their 2025 report, is that the system often fails to carry or amplify these individual gains. The challenge is that AI models, as massive generative systems, inherently produce failures (mispredictions). As code volume increases, this constant misprediction rate impacts systemic stability.

    Interestingly, even leading providers of AI solutions like OpenAI and Anthropic continue to be challenged by the issue of hallucinations and mispredictions, as well as the risks generated by AI. Speaking at a university in India, Sam Altman recently said “I probably trust the answers that come out of ChatGPT the least of anybody on Earth”.

    Without strategies and tools for alleviating the issues AI code produces downstream — such as improved observability to understand where something is going wrong — the “much bigger engine” of AI may not actually speed up software delivery after all.

    Tags: ai llms coding productivity gradle dpe hallucinations software work how-we-work

Thursday, November 20th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:06 am
How Slide Rules Work
Wednesday, November 19th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
11:50 am
Mic92/strace-macos
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:38 am
Cloudflare outage on November 18, 2025
  • Cloudflare outage on November 18, 2025

    tl;dr: a configuration-generation tool had buggy error handling code. Triggered by a permissions change, it generated over-large configs which then caused a crash in buggy config-reading code in their Bot Management module. This configuration was rolled out globally within minutes.

    As @kiall in ITC Slack notes: "the one thing I'd be pushing on after an outage like this (config mistake, propagated globally..) is "treat config like any other deployment - with a slow and steady rollout" -- and this is not called out in the postmortem. I agree this is a significant oversight.....

    Tags: postmortem cloudflare outages configuration deployment cloud error-handling rollouts via:itc

Thursday, November 13th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:00 am
Bitcoin’s big secret: How cryptocurrency became law enforcement’s secret weapon
  • Bitcoin's big secret: How cryptocurrency became law enforcement's secret weapon

    At the 2025 Bitwarden Open Source Security Summit, WIRED's Andy Greenberg sat down for a fireside chat with GigaOm analyst Paul Stringfellow to discuss a revelation that turned his decades-long reporting on its head: Bitcoin became a criminal's worst nightmare:

    In 2011, Greenberg thought he'd discovered the story of a lifetime: digital cash that promised complete anonymity. A decade later, that story flipped entirely.

    "I had this slow-motion epiphany that I was entirely wrong about Bitcoin. It was, in fact, the opposite of untraceable."

    But here's the paradox: if cryptocurrency tracing is so powerful, why do ransomware attacks, pig butchering scams, and North Korean hackers continue to steal billions?

    The answer: identifiability isn't the same as accountability.

    Tags: accountability prosecutions law policing bitcoin cryptocurrency andy-greenberg crime anonymity

Tuesday, November 11th, 2025
LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.
10:47 am
Real VT102 emulation with MAME
[ << Previous 20 ]

LJ.Rossia.org makes no claim to the content supplied through this journal account. Articles are retrieved via a public feed supplied by the site for this purpose.