Welcome to WarBulletin - your new best friend in the world of gaming. We're all about bringing you the hottest updates and juicy insights from across the gaming universe. Are you into epic RPG adventures or fast-paced eSports? We've got you covered with the latest scoop on everything from next-level PC gaming rigs to the coolest game releases. But hey, we're more than just news! Ever wondered what goes on behind the scenes of your favorite games? We're talking exclusive interviews with the brains behind the games, fresh off-the-press photos and videos straight from gaming conventions, and, of course, breaking news that you just can't miss. We know you love gaming 24/7, and that's why we're here round the clock, updating you on all things gaming. Whether it's the lowdown on a new patch or the buzz about the next big gaming celeb, we're on it.

Contacts

  • Owner: SNOWLAND s.r.o.
  • Registration certificate 06691200
  • 16200, Na okraji 381/41, Veleslavín, 162 00 Praha 6
  • Czech Republic

GeForce GPU giant has been data scraping 80 years' worth of videos every day for AI training to 'unlock various downstream applications critical to Nvidia'

Leaked documents, including spreadsheets, emails, and chat messages, show that Nvidia has been using millions of YouTube videos, Netflix, and other sources to train an AI model to be used in its Omniverse, autonomous vehicles, and digital avatar platforms.

The astonishing, but perhaps not surprising, scope of the data scraping was reported by 404 Media, who investigated the documents. It discovered that an internal project codenamed Cosmos (the same name but different to Nvidia's Cosmos Deep Learning service) had staff use dozens of virtual PCs on Amazon Web Service (AWS) to download so many videos per day that Nvidia accumulated over 30 million URLs in the space of one month.

Copyright laws and usage rights were repeatedly discussed by the employees, who found some creative ways to prevent any direct violation of them. For example, Nvidia employed the use of Google's cloud service to download the YouTube-8M dataset, as directly downloading the videos isn't permitted by the terms of service. 

In a leaked Slack channel discussion, one person remarked that «we cleared the download with Google/YouTube ahead of time and dangled as a carrot that we were going to do so using Google Cloud. After all, usually, for 8 million videos, they would get lots of ad impressions, revenue they lose out on when downloading for training, so they should get some money out of it.»

404 Media asked Nvidia to comment on the legal and ethical aspects of using copyrighted material for AI training and the company replied that it was in «in full compliance with the letter and the spirit of copyright law.»

With some datasets, their use is only permitted for academic purposes and although Nvidia does conduct a considerable amount of research (internally and with other institutions), the leaked materials clearly show that this data scraping was intended for commercial purposes.

Nvidia isn't the only firm to be doing this, of course—OpenAI and Runway have both been accused of knowingly using

Read more on pcgamer.com