Artificial Intelligence Archives - Singularity Hub https://singularityhub.com/tag/artificial-intelligence/ News and Insights on Technology, Science, and the Future from Singularity Group Tue, 17 Dec 2024 19:01:14 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.2 https://singularityhub.com/uploads/2021/09/6138dcf7843f950e69f4c1b8_singularity-favicon02.png Artificial Intelligence Archives - Singularity Hub https://singularityhub.com/tag/artificial-intelligence/ 32 32 4183809 The Tech World Is ‘Disrupting’ Book Publishing. But Do We Want Effortless Art? https://singularityhub.com/2024/12/17/the-tech-world-is-disrupting-book-publishing-but-do-we-want-effortless-art/ Tue, 17 Dec 2024 15:00:43 +0000 https://singularityhub.com/?p=159845 Publishing is one of many fields poised for disruption by tech companies and artificial intelligence. New platforms and approaches, like a book imprint by Microsoft and a self-publishing tech startup that uses AI, promise to make publishing faster and more accessible than ever.

But they also may threaten jobs—and demand a reconsideration of the status and role of books as cultural objects. And what will be the impact of TikTok owner ByteDance’s move into traditional book publishing?

Microsoft’s 8080 Books

Last month, Microsoft announced a new book imprint, 8080 Books. It will focus on nonfiction titles relating to technology, science, and business.

8080 Books plans “to test and experiment with the latest tech to accelerate and democratize book publishing,” though as some skeptics have noted, it is not yet entirely clear what this will entail.

The first title, No Prize for Pessimism  by Sam Schillace (Microsoft’s deputy chief technology officer) arguably sets the tone for the imprint. These “letters from a messy tech optimist” urge readers to embrace the disruptive potential of new technologies (AI is name-checked in the blurb), arguing optimism is essential for innovation and creativity. You can even discuss the book with its bespoke chatbot here.

Elsewhere, in the self-publishing space, tech startup Spines aims to bring 8,000 new books to market each year. For a fee, authors can use the publishing platform’s AI to edit, proofread, design, format, and distribute their books.

The move has been condemned by some authors and publishers, but Spines (like Microsoft) states its aim is to make publishing more open and accessible. Above all, it aims to make it faster, reducing the time it takes to publish to just a fortnight—rather than the long months of editing, negotiating, and waiting required by traditional publishing.

TikTok Is Publishing Books Too

Technological innovations are not just being used to speed up the publishing process, but also to identify profitable audiences, emerging authors, and genres that will sell. Chinese tech giant and owner of TikTok, ByteDance, launched their publishing imprint 8th Note Press (initially digital only) last year.

They are now partnering with Zando (an independent publishing company whose other imprints include one by actor Sarah Jessica Parker and another by the Pod Save America team’s Crooked Media) to produce a fiction range targeted at Gen Z readers. It will produce print books, to be sold in bookshops, from February.

8th Note Press focuses on the fantasy and romance genres (and authors) generating substantial followings on BookTok, the TikTok community proving invaluable for marketing and promoting new fiction. In the United States, authors with a strong presence on BookTok have seen a 23 percent growth in print sales in 2024, compared to 6 percent growth overall.

Access to Tiktok’s data and the ability to engineer viral videos could give 8th Note Press a serious advantage over legacy publishers in this space.

Hundreds of AI Self-Publishing Startups

These initiatives reflect some broader industry trends. Since OpenAI first demoed ChatGPT in 2022, approximately 320 publishing startups have emerged. Almost all of them revolve around AI in some way. There is speculation that the top five global publishers all have their own proprietary internal AI systems in the works.

Spotify’s entry into the audiobook market in 2023 has been described as a game changer by its CEO and is now using AI to recommend books to listeners. Other companies, like Storytel and Nuanxed, are using AI to autogenerate audiobook narration and expedite translations.

The embrace of AI may produce some useful innovations and efficiencies in publishing processes. It will almost certainly help publishers promote their authors and connect books with invested audiences. But it will have an impact on people working in the sector.

Companies like Storytel are using AI to narrate audiobooks. Image Credit: Karolina Grabowska/Pexels

Publishing houses have been consistently reducing in-house staff since the 1990s and relying more heavily on freelancers for editorial and design tasks. It would be naïve to think AI and other emerging technologies won’t be used to further reduce costs.

We are moving rapidly towards a future where once-important roles in the publishing sector—editing, translation, narration and voice acting, book design—will be increasingly performed by machines.

Spines’ CEO and cofounder, Yehuda Niv, has said, when queried, “We are not here to replace human creativity.” He emphasized his belief this automation will allow more writers to access the book market.

Storytel and Nuanxed have both suggested the growth of audiobook circulation will compensate for the replacement of human actors and translators. Exactly who will benefit the most from this growth—authors or faceless shareholders—remains to be seen.

Side Hustles, Grifts, and ‘Easy’ Writing

I appreciate Schillace’s genuine, thoughtful optimism about AI and other new technologies. (I will admit to not having read his book yet, but did have a stimulating conversation with its bot.) But my mind is drawn back to the techno-utopianists of the 19th century, like Edward Bellamy.

In his 1888 novel, Looking Backward, Bellamy speculates on a future in which art and literature flourishes, once advanced automation has freed people from the drudgery of miserable labor, leaving them with more time for cultural pursuits.

The inverse seems to be occurring now. Previously important and meaningful forms of cultural work are being increasingly automated.

I could be shortsighted about this, of course. The publishing disruption is just getting underway, and we’ve already made some great strides towards dispensing with the admittedly often quite miserable labor of writing itself.

We’re moving closer to ‘dispensing with the admittedly often quite miserable labour of writing itself’. Image Credit: Polina Zimmerman/Pexels

Soon after the launch of ChatGPT, science fiction magazines in the US had to close submissions, due being inundated with AI-generated short stories, many of them almost identical. Today, there are so many AI-assisted books being published on Amazon, they have had to limit self-publishing authors to just three uploads per day.

AI-assisted publishing enterprises range from side hustles focusing on republishing editions of texts in the public domain to grifts targeting unsuspecting readers and writers. All these schemes are premised on the idea writing can be rendered easy and effortless.

The use of AI may have other, delayed, costs though.

Can AI Be a ‘Thinking Partner’?

When I was younger, writing and publishing a lousy short story just obliterated my time and personal relationships. Now, I can do so with a one-sentence prompt, if I have a mind to—but apparently, this will destroy a lake somewhere.

Of course, as the No Prize for Pessimism bookbot takes pains to remind me, using AI in the writing process needn’t be a matter of lazy auto generation. It can be used for generative drafting, which is then revised, again and again, and integrated into the text.

AI can operate as a “thinking partner,” helping the writer with ideation and brainstorming. The technology is in its infancy, after all: There is bound to be some initial mess. But whatever way it is used, AI will help writers get to publication faster.

8080 Books’ charter offers a lot of rhetorical praise for the form of the book. We are told that books “matter,” that they impart “knowledge and wisdom,” that they “build empathy.” 8080 Books also wants to “accelerate the publishing process” and see less “lag” between the manuscript submission and its arrival in the marketplace. It wants books that are immediate and timely.

Slow Can Be Good

But what is a book if it arrives easily and at speed? Regardless of whether it is AI-generated or AI-assisted, it won’t be quite the same medium.

For much of their history, books have been defined by slowness and effort, both in writing and the journey towards publication. A book doesn’t always need to be up to date or of the moment.

Indeed, the hope might be that the slowness and effort of its production can lead to the book outlasting its immediate context and remaining relevant in other times and places.

Greater speed and broader access may be laudable aims for these publishing innovations. But they will also likely lead to greater disposability—at least in the short term—for both publishing professionals and the books themselves.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Image Credit: Muhammed ÖÇAL on Unsplash

]]>
159845
Blurry, Morphing, and Surreal: A New AI Aesthetic Is Emerging in Film https://singularityhub.com/2024/12/10/blurry-morphing-and-surreal-a-new-ai-aesthetic-is-emerging-in-film/ Tue, 10 Dec 2024 21:06:34 +0000 https://singularityhub.com/?p=159765

Type text into AI image and video generators, and you’ll often see outputs of unusual, sometimes creepy, pictures.

In a way, this is a feature, not a bug, of generative AI. And artists are wielding this aesthetic to create a new storytelling art form.

The tools, such as Midjourney to generate images, Runway, and Sora to produce videos, and Luma AI to create 3D objects, are relatively cheap or free to use. They allow filmmakers without access to major studio budgets or soundstages to make imaginative short films for the price of a monthly subscription.

I’ve studied these new works as the co-director of the AI for Media & Storytelling studio at the University of Southern California.

Surveying the increasingly captivating output of artists from around the world, I partnered with curators Jonathan Wells and Meg Grey Wells to produce the Flux Festival, a four-day showcase of experiments in AI filmmaking, in November 2024.

While this work remains dizzyingly eclectic in its stylistic diversity, I would argue that it offers traces of insight into our contemporary world. I’m reminded that in both literary and film studies, scholars believe that as cultures shift, so do the way we tell stories.

With this cultural connection in mind, I see five visual trends emerging in film.

1. Morphing, Blurring Imagery

In her “NanoFictions” series, the French artist Karoline Georges creates portraits of transformation. In one short, “The Beast,” a burly man mutates from a two-legged human into a hunched, skeletal cat, before morphing into a snarling wolf.

The metaphor—man is a monster—is clear. But what’s more compelling is the thrilling fluidity of transformation. There’s a giddy pleasure in seeing the figure’s seamless evolution that speaks to a very contemporary sensibility of shapeshifting across our many digital selves.

This sense of transformation continues in the use of blurry imagery that, in the hands of some artists, becomes an aesthetic feature rather than a vexing problem.

Theo Lindquist’s “Electronic Dance Experiment #3,” for example, begins as a series of rapid-fire shots showing flashes of nude bodies in a soft smear of pastel colors that pulse and throb. Gradually it becomes clear that this strange fluidity of flesh is a dance. But the abstraction in the blur offers its own unique pleasure; the image can be felt as much as it can be seen.

2. The Surreal

Thousands of TikTok videos demonstrate how cringy AI images can get, but artists can wield that weirdness and craft it into something transformative. The Singaporean artist known as Niceaunties creates videos that feature older women and cats, riffing on the concept of the “auntie” from Southeast and East Asian cultures.

In one recent video, the aunties let loose clouds of powerful hairspray to hold up impossible towers of hair in a sequence that grows increasingly ridiculous. Even as they’re playful and poignant, the videos created by Niceaunties can pack a political punch. They comment on assumptions about gender and age, for example, while also tackling contemporary issues such as pollution.

On the darker side, in a music video titled “Forest Never Sleeps,” the artist known as Doopiidoo offers up hybrid octopus-women, guitar-playing rats, rooster-pigs, and a wood-chopping ostrich-man. The visual chaos is a sweet match for the accompanying death metal music, with surrealism returning as a powerful form.

A group of 12 wailing women with long black hair and tentacles.
Doopiidoo’s uncanny music video ‘Forest Never Sleeps’ leverages artificial intelligence to create surreal visuals. Image Credit: Doopiidoo

3. Dark Tales

The often-eerie vibe of so much AI-generated imagery works well for chronicling contemporary ills, a fact that several filmmakers use to unexpected effect.

In “La Fenêtre,” Lucas Ortiz Estefanell of the AI agency SpecialGuestX pairs diverse image sequences of people and places with a contemplative voice-over to ponder ideas of reality, privacy, and the lives of artificially generated people. At the same time, he wonders about the strong desire to create these synthetic worlds. “When I first watched this video,” recalls the narrator, “the meaning of the image ceased to make sense.”

In the music video titled “Closer,” based on a song by Iceboy Violet and Nueen, filmmaker Mau Morgó captures the world-weary exhaustion of Gen Z through dozens of youthful characters slumbering, often under the green glow of video screens. The snapshot of a generation that has come of age in the era of social media and now artificial intelligence, pictured here with phones clutched close to their bodies as they murmur in their sleep, feels quietly wrenching.

A pre-teen girl dozes while holding a video game controller, surrounded by bright screens.
The music video for ‘Closer’ spotlights a generation awash in screens. Image Credit: Mau Morgó, Closer – Violet, Nueen

4. Nostalgia

Sometimes filmmakers turn to AI to capture the past.

Rome-based filmmaker Andrea Ciulu uses AI to reimagine 1980s East Coast hip-hop culture in “On These Streets,” which depicts the city’s expanse and energy through breakdancing as kids run through alleys and then spin magically up into the air.

Ciulu says that he wanted to capture New York’s urban milieu, all of which he experienced at a distance, from Italy, as a kid. The video thus evokes a sense of nostalgia for a mythic time and place to create a memory that is also hallucinatory.

Similarly, David Slade’s “Shadow Rabbit” borrows black-and-white imagery reminiscent of the 1950s to show small children discovering miniature animals crawling about on their hands. In just a few seconds, Slade depicts the enchanting imagination of children and links it to generated imagery, underscoring AI’s capacities for creating fanciful worlds.

5. New Times, New Spaces

In his video for the song “The Hardest Part” by Washed Out, filmmaker Paul Trillo creates an infinite zoom that follows a group of characters down the seemingly endless aisle of a school bus, through the high school cafeteria and out onto the highway at night. The video perfectly captures the zoominess of time and the collapse of space for someone young and in love haplessly careening through the world.

The freewheeling camera also characterizes the work of Montreal-based duo Vallée Duhamel, whose music video “The Pulse Within” spins and twirls, careening up and around characters who are cut loose from the laws of gravity.

In both music videos, viewers experience time and space as a dazzling, topsy-turvy vortex where the rules of traditional time and space no longer apply.

A car in flames mid-air on a foggy night.
In Vallée Duhamel’s ‘The Pulse Within,’ the rules of physics no longer apply. Image Credit: Vallée Duhamel

Right now, in a world where algorithms increasingly shape everyday life, many works of art are beginning to reflect how intertwined we’ve become with computational systems.

What if machines are suggesting new ways to see ourselves, as much as we’re teaching them to see like humans?

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Banner Image: A still from Theo Lindquist’s short film ‘Electronic Dance Experiment #3.’

]]>
159765
Google DeepMind’s New AI Weatherman Tops World’s Most Reliable System https://singularityhub.com/2024/12/06/google-deepminds-new-ai-weatherman-tops-worlds-most-reliable-system/ Fri, 06 Dec 2024 18:14:33 +0000 https://singularityhub.com/?p=159735 This was another year of rollercoaster weather. Heat domes broiled the US southwest. California experienced a “second summer” in October, with multiple cities breaking heat records. Hurricane Helene—and just a few weeks later, Hurricane Milton—pummeled the Gulf Coast, unleashing torrential rainfall and severe flooding. What shocked even seasoned meteorologists was how fast the hurricanes intensified, with one choking up as he said “this is just horrific.”

When bracing for extreme weather, every second counts. But planning measures rely on accurate predictions. Here’s where AI comes in.

This week, Google DeepMind unveiled an AI that predicts weather 15 days in advance in minutes, rather than the hours usually needed with traditional models. In a head-to-head with the European Center for Medium-Range Weather Forecasts’ model (ENS)—the best “medium-range” weather forecaster today—the AI won over 90 percent of the time.

Dubbed GenCast, the algorithm is DeepMind’s latest foray into weather prediction. Last year, they unleashed a version with strikingly accurate prediction for a 10-day forecast. GenCast differs in its machine learning architecture. True to its name, it’s a generative AI model, roughly similar to those that power ChatGPT, Gemini, or generate images and videos with a text prompt.

The setup gives GenCast an edge over previous models, which usually provide a single weather path prediction. GenCast, in contrast, pumps out 50 or more predictions—each representing a potential weather trajectory, while assigning their likelihood.

In other words, the AI “imagines” a multiverse of future weather possibilities and picks the one with the largest chance of occurring.

GenCast didn’t just excel at day-to-day weather prediction. It also beat ENS at predicting extreme weather—heat, cold, and high wind speeds. Challenged with data from Typhoon Hagibis—the deadliest tropical cyclone to strike Japan in decades—GenCast visualized possible routes seven days before landfall.

“As climate change drives more extreme weather events, accurate and trustworthy forecasts are more essential than ever,” wrote study authors Ilan Price and Matthew Wilson in a DeepMind blog post.

Embracing Uncertainty

Predicting weather is notoriously difficult. This is largely because weather is a chaotic system. You might have heard of the “butterfly effect”—a butterfly flaps it wings, stirring a tiny change in the atmosphere and triggering tsunamis and other weather disasters a world apart. Although just a metaphor, it highlights that any small changes in initial weather conditions can rapidly spread across large regions, changing weather outcomes.

For decades, scientists have tried to emulate these processes using physical simulations of the Earth’s atmosphere. By gathering data from weather stations across the globe and satellites, they’ve written equations mapping current estimates of the weather and forecasting how they’ll change over time.

The problem? The deluge of data takes hours, if not days, to crunch on supercomputers, and consumes a huge amount of energy.

AI may be able to help. Rather than mimicking the physics of atmospheric shifts or the swirls of our oceans, these systems slurp up decades of data to find weather patterns. GraphCast, released in 2013, captured more than a million points across our planet’s surface to predict 10-day weather in less than a minute. Others in the race to improve weather forecasting are Huawei’s Pangu-Weather and NowcastNet, both based in China. The latter gauges the chance of rain with high accuracy—one of the toughest aspects of weather prediction.

But weather is finicky. GraphCast and other similar weather-prediction AI models, in contrast, are deterministic. They only forecast a single weather trajectory. The weather community is now increasingly embracing an “ensemble model,” which predicts a range of possible scenarios.

“Such ensemble forecasts are more useful than relying on a single forecast, as they provide decision makers with a fuller picture of possible weather conditions in the coming days and weeks and how likely each scenario is,” wrote the team.

Cloudy With a Chance of Rain

GenCast tackles the weather’s uncertainty head-on. The AI mainly relies on a diffusion model, a type of generative AI. Overall, it incorporates 12 metrics about the Earth’s surface and atmosphere—such as temperature, wind speed, humidity, and atmospheric pressure—traditionally used to gauge weather.

The team trained the AI on 40 years of historical weather data from a publicly available database up to 2018. Rather than asking for one prediction, they had GenCast spew out a number of forecasts, each one starting with a slightly different weather condition—a different “butterfly,” so to speak. The results were then combined into an ensemble forecast, which also predicted the chance of each weather pattern actually occurring.

When tested with weather data from 2019, which GenCast had never seen, the AI outperformed the current leader, ENS—especially for longer-term forecasting up to 15 days. Checked against recorded data, the AI outperformed ENS 97 percent of the time across 1,300 measures of weather prediction.

GenCast’s predictions are also blazingly fast. Compared to the hours on supercomputers usually needed to generate results, the AI churned out predictions in roughly eight minutes. If adopted, the system could add valuable time for emergency notices.

All for One

Although GenCast wasn’t explicitly trained to forecast severe weather patterns, it was able to predict the path of Typhoon Hagibis before landfall in central Japan. One of the deadliest storms in decades, the typhoon flooded neighborhoods up to the rooftops as water broke through levees and took out much of the region’s electrical power.

GenCast’s ensemble prediction was like a movie. It began with a relatively wide range of possible paths for Typhoon Hagibis seven days before landfall. As the storm edged closer, however, the AI got more accurate, narrowing its predictive path. Although not perfect, GenCast painted an overall trajectory of the devastating cyclone that closely matched recorded data.

Given a week of lead time, “GenCast can provide substantial value in decisions about

when and how to prepare for tropical cyclones,” wrote the authors.

Accurate and longer predictions don’t just help prepare for future climate challenges. They could also help optimize renewable energy planning. Take wind power. Predicting where, when, and how strong wind is likely to blow could increase the power source’s reliability—reducing costs and potentially upping adoption of the technology. In a proof-of-concept analysis, GenCast was more accurate than ENS at predicting total wind power generated by over 5,000 wind power plants across the globe, opening the possibility of building wind farms based on data.

GenCast isn’t the only AI weatherman. Nvidia’s FourCastNet also uses generative AI to predict weather with a lower energy cost than traditional methods. Google Research has also engineered myriad weather-predicting algorithms, including NeuralGCM and SEEDS. Some are being integrated into Google search and maps, including rain forecasts, wildfires, flooding, and heat alerts. Microsoft joined the race with ClimaX, a flexible AI that can be tailored to generate predictions from hours to months ahead (with varying accuracies).

All this is not to say AI will be taking jobs from meteorologists. The DeepMind team stresses that GenCast wouldn’t be possible without foundational work from climate scientists and physics-based models. To give back, they’re releasing aspects of GenCast to the wider weather community to gain further insights and feedback.

Image Credit: NASA

]]>
159735
Most Supposedly ‘Open’ AI Systems Are Actually Closed—and That’s a Problem https://singularityhub.com/2024/11/30/most-supposedly-open-ai-systems-are-actually-closed-and-thats-a-problem/ Sat, 30 Nov 2024 15:00:17 +0000 https://singularityhub.com/?p=159691 “Open” AI models have a lot to give. The practice of sharing source code with the public spurs innovation and democratizes AI as a tool.

Or so the story goes. A new analysis in Nature puts a twist on the narrative: Most supposedly “open” AI models, such as Meta’s Llama 3, are hardly that.

Rather than encouraging or benefiting small startups, the “rhetoric of openness is frequently wielded in ways that…exacerbate the concentration of power” in large tech companies, wrote David Widder at Cornell University, Meredith Whittaker at Signal Foundation, and Sarah West at AI Now Institute.

Why care? Debating AI openness seems purely academic. But with growing use of ChatGPT and other large language models, policymakers are scrambling to catch up. Can models be allowed in schools or companies? What guiderails should be in place to protect against misuse?

And perhaps most importantly, most AI models are controlled by Google, Meta, and other tech giants, which have the infrastructure and financial means to either develop or license the technology—and in turn, guide the evolution of AI to meet their financial incentives.

Lawmakers around the globe have taken note. This year, the European Union adopted the AI Act, the world’s first comprehensive legislation to ensure AI systems used are “safe, transparent, non-discriminatory, and environmentally friendly.” As of September, there were over 120 AI bills in Congress, chaperoning privacy, accountability, and transparency.

In theory, open AI models can deliver those needs. But “when policy is being shaped, definitions matter,” wrote the team.

In the new analysis, they broke down the concept of “openness” in AI models across the entire development cycle and pinpointed how the term can be misused.

What Is ‘Openness,’ Anyway?

The term “open source” is nearly as old as software itself.

At the turn of the century, small groups of computing rebels released code for free software that anyone could download and use in defiance of corporate control. They had a vision: Open-source software, such as freely available word processors similar to Microsoft’s, could level the playing field for little guys and allow access to people who couldn’t afford the technology. The code also became a playground, where eager software engineers fiddled around with the code to discover flaws in need of fixing—resulting in more usable and secure software.

With AI, the story’s different. Large language models are built with numerous layers of interconnected artificial “neurons.” Similar to their biological counterparts, the structure of those connections heavily influences a model’s performance in a specific task.

Models are trained by scraping the internet for text, images, and increasingly, videos. As this training data flows through their neural networks, they adjust the strengths of their artificial neurons’ connections—dubbed “weights”—so that they generate desired outputs. Most systems are then evaluated by people to judge the accuracy and quality of the results.

The problem? Understanding these systems’ internal processes isn’t straightforward. Unlike traditional software, sharing only the weights and code of an AI model, without the underlying training data, makes it difficult for other people to detect potential bugs or security threats.

This means previous concepts from open-source software are being applied in “ill-fitting ways to AI systems,” wrote the team, leading to confusion about the term.

Openwashing

Current “open” AI models span a range of openness, but overall, they have three main characteristics.

One is transparency, or how much detail about an AI model’s setup its creator publishes. Eleuther AI’s Pythia series, for example, allows anyone to download the source code, underlying training data, and full documentation. They also license the AI model for wide reuse, meeting the definition of “open source” from the Open Source Initiative, a non-profit that has defined the term as it has evolved over nearly three decades. In contrast, Meta’s Llama 3, although described as open, only allows people to build on their AI through an API—a sort of interface that lets different software communicate, without sharing the underlying code—or download just the model’s weights to tinker but with restrictions on their usage.

“This is ‘openwashing’ systems that are better understood as closed,” wrote the authors.

A second characteristic is reusability, in that openly licensed data and details of an AI model can be used by other people (although often only through a cloud service—more on that later.) The third characteristic, extensibility, lets people fine-tune existing models for their specific needs.

“[This] is a key feature championed particularly by corporate actors invested in open AI,” wrote the team. There’s a reason: Training AI models requires massive computing power and resources, often only available to large tech companies. Llama 3, for example, was trained on 15 trillion tokens—a unit for processing data, such as words or characters. These choke points make it hard for startups to build AI systems from scratch. Instead, they often retrain “open” systems to adapt them to a new task or run more efficiently. Stanford’s AI Alpaca model, based on Llama, for example, gained interest for the fact it could run on a laptop.

There’s no doubt that many people and companies have benefited from open AI models. But to the authors, they may also be a barrier to the democratization of AI.

The Dark Side

Many large-scale open AI systems today are trained on cloud servers, the authors note. The UAE’s Technological Innovation Institute developed Falcon 40B and trained it on Amazon’s AWS servers. MosaicML’s AI is “tied to Microsoft’s Azure.” Even OpenAI has partnered with Microsoft to offer its new AI models at a price.

While cloud computing is extremely useful, it limits who can actually run AI models to a handful of large companies—and their servers. Stanford’s Alpaca eventually shut down partially due to a lack of financial resources.

Secrecy around training data is another concern. “Many large-scale AI models described as open neglect to provide even basic information about the underlying data used to train the system,” wrote the authors.

Large language models process huge amounts of data scraped from the internet, some of which is copyrighted, resulting in a number of ongoing lawsuits. When datasets aren’t readily made available, or when they’re incredibly large, it’s tough to fact-check the model’s reported performance, or if the datasets “launder others’ intellectual property,” according to the authors.

The problem gets worse when building frameworks, often developed by large tech companies, to minimize the time “[reinventing] the wheel.” These pre-written pieces of code, workflows, and evaluation tools help developers quickly build on an AI system. However, most tweaks don’t change the model itself. In other words, whatever potential problems or biases that exist inside the models could also propagate to downstream applications.

An AI Ecosystem

To the authors, developing AI that’s more open isn’t about evaluating one model at a time. Rather, it’s about taking the whole ecosystem into account.

Most debates on AI openness miss the larger picture. As AI advances, “the pursuit of openness on its own will be unlikely to yield much benefit,” wrote the team. Instead, the entire cycle of AI development—from setting up, training, and running AI systems to their practical uses and financial incentives—has to be considered when building open AI policies.

“Pinning our hopes on ‘open’ AI in isolation will not lead us to that world,” wrote the team.

Image Credit: x / x

]]>
159691
OpenAI’s GPT-4o Makes AI Clones of Real People With Surprising Ease https://singularityhub.com/2024/11/29/openais-gpt-4o-makes-ai-clones-of-real-people-with-surprising-ease/ Fri, 29 Nov 2024 15:00:22 +0000 https://singularityhub.com/?p=159630 AI has become uncannily good at aping human conversational capabilities. New research suggests its powers of mimicry go a lot further, making it possible to replicate specific people’s personalities.

Humans are complicated. Our beliefs, character traits, and the way we approach decisions are products of both nature and nurture, built up over decades and shaped by our distinctive life experiences.

But it appears we might not be as unique as we think. A study led by researchers at Stanford University has discovered that all it takes is a two-hour interview for an AI model to predict people’s responses to a battery of questionnaires, personality tests, and thought experiments with an accuracy of 85 percent.

While the idea of cloning people’s personalities might seem creepy, the researchers say the approach could become a powerful tool for social scientists and politicians looking to simulate responses to different policy choices.

“What we have the opportunity to do now is create models of individuals that are actually truly high-fidelity,” Stanford’s Joon Sung Park from, who led the research, told New Scientist.We can build an agent of a person that captures a lot of their complexities and idiosyncratic nature.”

AI wasn’t used only to create virtual replicas of the study participants, it also helped gather the necessary training data. The researchers got a voice-enabled version of OpenAI’s GPT-4o to interview people using a script from the American Voices Project—a social science initiative aimed at gathering responses from American families on a wide range of issues.

As well as asking preset questions, the researchers also prompted the model to ask follow-up questions based on how people responded. The model interviewed 1,052 people across the US for two hours and produced transcripts for each individual.

Using this data, the researchers created GPT-4o-powered AI agents to answer questions in the same way the human participant would. Every time an agent fielded a question, the entire interview transcript was included alongside the query, and the model was told to imitate the participant.

To evaluate the approach, the researchers had the agents and human participants go head-to-head on a range of tests. These included the General Social Survey, which measures social attitudes to various issues; a test designed to judge how people score on the Big Five personality traits; several games that test economic decision making; and a handful of social science experiments.

Humans often respond quite differently to these kinds of tests at different times, which would throw off comparisons to the AI models. To control for this, the researchers asked the humans to complete the test twice, two weeks apart, so they could judge how consistent participants were.

When the team compared responses from the AI models against the first round of human responses, the agents were roughly 69 percent accurate. But taking into account how the humans’ responses varied between sessions, the researchers found the models hit an accuracy of 85 percent.

Hassaan Raza, the CEO of Tavus, a company that creates “digital twins” of customers, told MIT Technology Review that the biggest surprise from the study was how little data it took to create faithful copies of real people. Tavus normally needs a trove of emails and other information to create their AI clones.

“What was really cool here is that they show you might not need that much information,” he said. “How about you just talk to an AI interviewer for 30 minutes today, 30 minutes tomorrow? And then we use that to construct this digital twin of you.”

Creating realistic AI replicas of humans could prove a powerful tool for policymaking, Richard Whittle at the University of Salford, UK, told New Scientist, as AI focus groups could be much cheaper and quicker than ones made up of humans.

But it’s not hard to see how the same technology could be put to nefarious uses. Deepfake video has already been used to pose as a senior executive in an elaborate multi-million-dollar scam. The ability to mimic a target’s entire personality would likely turbocharge such efforts.

Either way, the research suggests that machines that can realistically imitate humans in a wide range of settings are imminent.

Image Credit: Richmond Fajardo on Unsplash

]]>
159630
Niantic Is Training a Giant ‘Geospatial’ AI on Pokémon Go Data https://singularityhub.com/2024/11/27/niantic-is-training-a-giant-geospatial-ai-on-pokemon-go-data/ Wed, 27 Nov 2024 15:00:50 +0000 https://singularityhub.com/?p=159547 If you want to see what’s next in AI, just follow the data. ChatGPT and DALL-E trained on troves of internet data. Generative AI is making inroads in biotechnology and robotics thanks to existing or newly assembled datasets. One way to glance ahead, then, is to ask: What colossal datasets are still ripe for the picking?

Recently, a new clue emerged.

In a blog post, gaming company Niantic said it’s training a new AI on millions of real-world images collected by Pokémon Go players and in its Scaniverse app. Inspired by the large language models powering chatbots, they call their algorithm a “large geospatial model” and hope it’ll be as fluent in the physical world as ChatGPT is in the world of language.

Follow the Data

This moment in AI is defined by algorithms that generate language, images, and increasingly, video. With OpenAI’s DALL-E and ChatGPT, anyone can use everyday language to get a computer to whip up photorealistic images or explain quantum physics. Now, the company’s Sora algorithm is applying a similar approach to video generation. Others are competing with OpenAI, including Google, Meta, and Anthropic.

The crucial insight that gave rise to these models: The rapid digitization of recent decades is useful for more than entertaining and informing us humans—it’s food for AI too. Few would have viewed the internet in this way at its advent, but in hindsight, humanity has been busy assembling an enormous educational dataset of language, images, code, and video. For better or worse—there are several copyright infringement lawsuits in the works—AI companies scraped all that data to train powerful AI models.

Now that they know the basic recipe works well, companies and researchers are looking for more ingredients.

In biotech, labs are training AI on collections of molecular structures built over decades and using it to model and generate proteins, DNA, RNA, and other biomolecules to speed up research and drug discovery. Others are testing large AI models in self-driving cars and warehouse and humanoid robots—both as a better way to tell robots what to do, but also to teach them how to navigate and move through the world.

Of course, for robots, fluency in the physical world is crucial. Just as language is endlessly complex, so too are the situations a robot might encounter. Robot brains coded by hand can never account for all the variation. That’s why researchers are now building large datasets with robots in mind. But they’re nowhere near the scale of the internet, where billions of humans have been working in parallel for a very long time.

Might there be an internet for the physical world? Niantic thinks so. It’s called Pokémon Go. But the hit game is only one example. Tech companies have been creating digital maps of the world for years. Now, it seems likely those maps will find their way into AI.

Pokémon Trainers

Released in 2016, Pokémon Go was an augmented reality sensation.

In the game, players track down digital characters—or Pokémon—that have been placed all over the world. Using their phones as a kind of portal, players see characters superimposed on a physical location—say, sitting on a park bench or loitering by a movie theater. A newer offering, Pokémon Playground, allows users to embed characters at locations for other players. All this is made possible by the company’s detailed digital maps.

Niantic’s Visual Positioning System (VPS) can determine a phone’s position down to the centimeter from a single image of a location. In part, VPS assembles 3D maps of locations classically, but the system also relies on a network of machine learning algorithms—one or more per location—trained on years of player images and scans taken at various angles, times of day, and seasons and stamped with a position in the world.

“As part of Niantic’s Visual Positioning System (VPS), we have trained more than 50 million neural networks, with more than 150 trillion parameters, enabling operation in over a million locations,” the company wrote in its recent blog post.

Now, Niantic wants to go further.

Instead of millions of individual neural networks, they want to use Pokémon Go and Scaniverse data to train a single foundation model. Whereas individual models are constrained by the images they’ve been fed, the new model would generalize across all of them. Confronted with the front of a church, for example, it would draw on all the churches and angles it’s seen—front, side, rear—to visualize parts of the church it hasn’t been shown.

This is a bit like what we humans do as we navigate the world. We might not be able to see around a corner, but we can guess what’s there—it might be a hallway, the side of a building, or a room—and plan for it, based on our point of view and experience.

Niantic writes that a large geospatial model would allow it to improve augmented reality experiences. But it also believes such a model might power other applications, including in robotics and autonomous systems.

Getting Physical

Niantic believes it’s in a unique position because it has an engaged community contributing a million new scans a week. In addition, those scans are from the view of pedestrians, as opposed to the street, like in Google Maps or for self-driving cars. They’re not wrong.

If we take the internet as an example, then the most powerful new datasets may be collected by millions, or even billions, of humans working in concert.

At the same time, Pokémon Go isn’t comprehensive. Though locations span continents, they’re sparse in any given place and whole regions are completely dark. Further, other companies, perhaps most notably, Google, have long been mapping the globe. But unlike the internet, these datasets are proprietary and splintered.

Whether that matters—that is, whether an internet-sized dataset is needed to make a generalized AI that’s as fluent in the physical world as LLMs are in the verbal—isn’t clear.

But it’s possible a more complete dataset of the physical world arises from something like Pokémon Go, only supersized. This has already begun with smartphones, which have sensors to take images, videos, and 3D scans. In addition to AR apps, users are increasingly being incentivized to use these sensors with AI—like, taking a picture of a fridge and asking a chatbot what to cook for dinner. New devices, like AR glasses could expand this kind of usage, yielding a data bonanza for the physical world.

Of course, collecting data online is already controversial, and privacy is a big issue. Extending those problems to the real world is less than ideal.

After 404 Media published an article on the topic, Niantic added a note, “This scanning feature is completely optional—people have to visit a specific publicly-accessible location and click to scan. This allows Niantic to deliver new types of AR experiences for people to enjoy. Merely walking around playing our games does not train an AI model.” Other companies, however, may not be as transparent about data collection and use.

It’s also not certain new algorithms inspired by large language models will be straightforward. MIT, for example, recently built a new architecture aimed specifically at robotics. “In the language domain, the data are all just sentences,” Lirui Wang, the lead author of a paper describing the work, told TechCrunch.  “In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture.”

Regardless, researchers and companies will likely continue exploring areas where LLM-like AI may be applicable. And perhaps as each new addition matures, it will be a bit like adding a brain region—stitch them together and you get machines that think, speak, write, and move through the world as effortlessly as we do.

Image: Kamil Switalski on Unsplash

]]>
159547
‘Droidspeak’: AI Agents Now Have Their Own Language Thanks to Microsoft https://singularityhub.com/2024/11/21/droidspeak-ai-agents-now-have-their-own-language-thanks-to-microsoft/ Thu, 21 Nov 2024 21:04:59 +0000 https://singularityhub.com/?p=159581 Getting AIs to work together could be a powerful force multiplier for the technology. Now, Microsoft researchers have invented a new language to help their models talk to each other faster and more efficiently.

AI agents are the latest buzzword in Silicon Valley. These are AI models that can carry out complex, multi-step tasks autonomously. But looking further ahead, some see a future where multiple AI agents collaborate to solve even more challenging problems.

Given that these agents are powered by large language models (LLMs), getting them to work together usually relies on agents speaking to each other in natural language, often English. But despite their expressive power, human languages might not be the best medium of communication for machines that fundamentally operate in ones and zeros.

This prompted researchers from Microsoft to develop a new method of communication that allows agents to talk to each other in the high-dimensional mathematical language underpinning LLMs. They’ve named the new approach Droidspeak—a reference to the beep and whistle-based language used by robots in Star Wars—and in a preprint paper published on the arXiv, the Microsoft team reports it enabled models to communicate 2.78 times faster with little accuracy lost.

Typically, when AI agents communicate using natural language, they not only share the output of the current step they’re working on, but also the entire conversation history leading up to that point. Receiving agents must process this big chunk of text to understand what the sender is talking about.

This creates considerable computational overhead, which grows rapidly if agents engage in a repeated back-and-forth. Such exchanges can quickly become the biggest contributor to communication delays, say the researchers, limiting the scalability and responsiveness of multi-agent systems.

To break the bottleneck, the researchers devised a way for models to directly share the data created in the computational steps preceding language generation. In principle, the receiving model would use this directly rather than processing language and then creating its own high-level mathematical representations.

However, it’s not simple transferring the data between models. Different models represent language in very different ways, so the researchers focused on communication between versions of the same underlying LLM.

Even then, they had to be smart about what kind of data to share. Some data can be reused directly by the receiving model, while other data needs to be recomputed. The team devised a way of working this out automatically to squeeze the biggest computational savings from the approach.

Philip Feldman at the University of Maryland, Baltimore County told New Scientist that the resulting communication speed-ups could help multi-agent systems tackle bigger, more complex problems than possible using natural language.

But the researchers say there’s still plenty of room for improvement. For a start, it would be helpful if models of different sizes and configurations could communicate. And they could squeeze out even bigger computational savings by compressing the intermediate representations before transferring them between models.

However, it seems likely this is just the first step towards a future in which the diversity of machine languages rivals that of human ones.

Image Credit: Shawn Suttle from Pixabay

]]>
159581
Poetry by History’s Greatest Poets or AI? People Can’t Tell the Difference—and Even Prefer the Latter. What Gives? https://singularityhub.com/2024/11/19/poetry-by-historys-greatest-poets-or-ai-people-cant-tell-the-difference-and-even-prefer-the-latter-what-gives/ Tue, 19 Nov 2024 15:00:04 +0000 https://singularityhub.com/?p=159528

Here are some lines Sylvia Plath never wrote:

The air is thick with tension,
My mind is a tangled mess,
The weight of my emotions
Is heavy on my chest.

This apparently Plath-like verse was produced by GPT-3.5 in response to the prompt “write a short poem in the style of Sylvia Plath.”

The stanza hits the key points readers may expect of Plath’s poetry, and perhaps a poem more generally. It suggests a sense of despair as the writer struggles with internal demons. “Mess” and “chest” are a near-rhyme, which reassures us that we are in the realm of poetry.

According to a new paper in Nature Scientific Reports, non-expert readers of poetry cannot distinguish poetry written by AI from that written by canonical poets. Moreover, general readers tend to prefer poetry written by AI—at least until they are told it is written by a machine.

In the study, AI was used to generate poetry “in the style of” 10 poets: Geoffrey Chaucer, William Shakespeare, Samuel Butler, Lord Byron, Walt Whitman, Emily Dickinson, TS Eliot, Allen Ginsberg, Sylvia Plath, and Dorothea Lasky.

Participants were presented with 10 poems in random order, five from a real poet and five AI imitations. They were then asked whether they thought each poem was AI or human, rating their confidence on a scale of 1 to 100.

A second group of participants was exposed to three different scenarios. Some were told that all the poems they were given were human. Some were told they were reading only AI poems. Some were not told anything.

They were then presented with five human and five AI poems and asked to rank them on a seven point scale, from extremely bad to extremely good. The participants who were told nothing were also asked to guess whether each poem was human or AI.

The researchers found that AI poems scored higher than their human-written counterparts in attributes such as “creativity,” “atmosphere,” and “emotional quality.”

The AI “Plath” poem quoted above is one of those included in the study, set against several she actually wrote.

A Sign of Quality?

As a lecturer in English, these outcomes do not surprise me. Poetry is the literary form that my students find most unfamiliar and difficult. I am sure this holds true of wider society as well.

While most of us have been taught poetry at some point, likely in high school, our reading does not tend to go much beyond that. This is despite the ubiquity of poetry. We see it every day: circulated on Instagram, plastered on coffee cups, and printed in greeting cards.

The researchers suggest that “by many metrics, specialized AI models are able to produce high-quality poetry.” But they don’t interrogate what we actually mean by “high-quality.”

In my view, the results of the study are less testaments to the “quality” of machine poetry than to the wider difficulty of giving life to poetry. It takes reading and rereading to experience what literary critic Derek Attridge has called the “event” of literature, where “new possibilities of meaning and feeling” open within us. In the most significant kinds of literary experiences, “we feel pulled along by the work as we push ourselves through it”.

Attridge quotes philosopher Walter Benjamin to make this point: Literature “is not statement or the imparting of information.”

Philosopher Walter Benjamin argued that literature is not simply the imparting of information. Image Credit: Public domain, via Wikimedia Commons

Yet pushing ourselves through remains as difficult as ever—perhaps more so in a world where we expect instant answers. Participants favored poems that were easier to interpret and understand.

When readers say they prefer AI poetry, then, they would seem to be registering their frustration when faced with writing that does not yield to their attention. If we do not know how to begin with poems, we end up relying on conventional “poetic” signs to make determinations about quality and preference.

This is of course the realm of GPT, which writes formally adequate sonnets in seconds. The large language models used in AI are success-orientated machines that aim to satisfy general taste, and they are effective at doing so. The machines give us the poems we think we want: Ones that tell us things.

How Poems Think

The work of teaching is to help students attune themselves to how poems think, poem by poem and poet by poet, so they can gain access to poetry’s specific intelligence. In my introductory course, I take about an hour to work through Sylvia Plath’s “Morning Song.” I have spent 10 minutes or more on the opening line: “Love set you going like a fat gold watch.”

How might a “watch” be connected to “set you going”? How can love set something going? What does a “fat gold watch” mean to you—and how is it different from a slim silver one? Why “set you going” rather than “led to your birth”? And what does all this mean in a poem about having a baby, and all the ambivalent feelings this may produce in a mother?

In one of the real Plath poems that was included in the survey, “Winter Landscape, With Rooks,” we observe how her mental atmosphere unfurls around the waterways of the Cambridgeshire Fens in February:

Water in the millrace, through a sluice of stone,
plunges headlong into that black pond
where, absurd and out-of-season, a single swan
floats chaste as snow, taunting the clouded mind
which hungers to haul the white reflection down.

How different is this to GPT’s Plath poem? The achievement of the opening of “Winter Landscape, With Rooks” is how it intricately explores the connection between mental events and place. Given the wider interest of the poem in emotional states, its details seem to convey the tumble of life’s events through our minds.

Our minds are turned by life just as the mill is turned by water; these experiences and mental processes accumulate in a scarcely understood “black pond.”

Intriguingly, the poet finds that this metaphor, well constructed though it may be, does not quite work. This is not because of a failure of language, but because of the landscape she is trying to turn into art, which is refusing to submit to her emotional atmosphere. Despite everything she feels, a swan floats on serenely—even if she “hungers” to haul its “white reflection down.”

I mention these lines because they turn around the Plath-like poem of GPT-3.5. They remind us of the unexpected outcomes of giving life to poems. Plath acknowledges not just the weight of her despair, but the absurd figure she may be within a landscape she wants to reflect her sadness.

She compares herself to the bird that gives the poem its title:

feathered dark in thought, I stalk like a rook,
brooding as the winter night comes on.

These lines are unlikely to register highly in the study’s terms of literary response—“beautiful,” “inspiring,” “lyrical,” “meaningful,” and so on. But there is a kind of insight to them. Plath is the source of her torment, “feathered” as she is with her “dark thoughts.” She is “brooding,” trying to make the world into her imaginative vision.

Sylvia Plath. Image Credit: RBainbridge2000, via Wikimedia Commons, CC BY

The authors of the study are both right and wrong when they write that AI can “produce high-quality poetry.” The preference the study reveals for AI poetry over that written by humans does not suggest that machine poems are of a higher quality. The AI models can produce poems that rate well on certain “metrics.” But the event of reading poetry is ultimately not one in which we arrive at standardized criteria or outcomes.

Instead, as we engage in imaginative tussles with poems, both we and the poem are newly born. So the outcome of the research is that we have a highly specified and well thought-out examination of how people who know little about poetry respond to poems. But it fails to explore how poetry can be enlivened by meaningful shared encounters.

Spending time with poems of any kind, attending to their intelligence and the acts of sympathy and speculation required to confront their challenges, is as difficult as ever. As the Plath of GPT-3.5 puts it:

My mind is a tangled mess,
[…]
I try to grasp at something solid.

The Conversation


 

This article is republished from The Conversation under a Creative Commons license. Read the original article.

]]>
159528
A ChatGPT-Like AI Can Now Design Whole New Genomes From Scratch https://singularityhub.com/2024/11/18/a-chatgpt-like-ai-can-now-design-entirely-new-genomes-from-scratch/ Mon, 18 Nov 2024 22:59:39 +0000 https://singularityhub.com/?p=159515 All life on Earth is written with four DNA “letters.” An AI just used those letters to dream up a completely new genome from scratch.

Called Evo, the AI was inspired by the large language models, or LLMs, underlying popular chatbots such as OpenAI’s ChatGPT and Anthropic’s Claude. These models have taken the world by storm for their prowess at generating human-like responses. From simple tasks, such as defining an obtuse word, to summarizing scientific papers or spewing verses fit for a rap battle, LLMs have entered our everyday lives.

If LLMs can master written languages—could they do the same for the language of life?

This month, a team from Stanford University and the Arc Institute put the theory to the test. Rather than training Evo on content scraped from the internet, they trained the AI on nearly three million genomes—amounting to billions of lines of genetic code—from various microbes and bacteria-infecting viruses.

Evo was better than previous AI models at predicting how mutations to genetic material—DNA and RNA—could alter function. The AI also got creative, dreaming up several new components for the gene editing tool, CRISPR. Even more impressively, the AI generated a genome more than a megabase long—roughly the size of some bacterial genomes.

“Overall, Evo represents a genomic foundation model,” wrote Christina Theodoris at the Gladstone Institute in San Francisco, who was not involved in the work.

Having learned the genomic vocabulary, algorithms like Evo could help scientists probe evolution, decipher our cells’ inner workings, tackle biological mysteries, and fast-track synthetic biology by designing complex new biomolecules.

The DNA Multiverse

Compared to the English alphabet’s 26 letters, DNA only has A, T, C, and G. These ‘letters’ are shorthand for the four molecules—adenine (A), thymine (T), cytosine (C), and guanine (G)— that, combined, spell out our genes. If LLMs can conquer languages and generate new prose, rewriting the genetic handbook with only four letters should be a piece of cake.

Not quite. Human language is organized into words, phrases, and punctuated into sentences to convey information. DNA, in contrast, is more continuous, and genetic components are complex. The same DNA letters carry “parallel threads of information,” wrote Theodoris.

The most familiar is DNA’s role as genetic carrier. A specific combination of three DNA letters, called a codon, encodes a protein building block. These are strung together into the proteins that make up our tissues, organs, and direct the inner workings of our cells.

But the same genetic sequence, depending on its structure, can also recruit the molecules needed to turn codons into proteins. And sometimes, the same DNA letters can turn one gene into different proteins depending on a cell’s health and environment or even turn the gene off.

In other words, DNA letters contain a wealth of information about the genome’s complexity. And any changes can jeopardize a protein’s function, resulting in genetic disease and other health problems. This makes it critical for AI to work at the resolution of single DNA letters.

But it’s hard for AI to capture multiple threads of information on a large scale by analyzing genetic letters alone, partially due to high computational costs. Like ancient Roman scripts, DNA is a continuum of letters without clear punctuation. So, it could be necessary to “read” whole strands to gain an overall picture of their structure and function—that is, to decipher meaning.

Previous attempts have “bundled” DNA letters into blocks—a bit like making artificial words. While easier to process, these methods disrupt the continuity of DNA, resulting in the retention of “ some threads of information at the expense of others,” wrote Theodoris.

Building Foundations

Evo addressed these problems head on. Its designers aimed to preserve all threads of information, while operating at single-DNA-letter resolution with lower computational costs.

The trick was to give Evo a broader context for any given chunk of the genome by leveraging a specific type of AI setup used in a family of algorithms called StripedHyena. Compared to GPT-4 and other AI models, StripedHyena is designed to be faster and more capable of processing large inputs—for example, long lengths of DNA. This broadened Evo’s so-called “search window,” allowing it to better find patterns across a larger genetic landscape.

The researchers then trained the AI on a database of nearly three million genomes from bacteria and viruses that infect bacteria, known as phages. It also learned from plasmids, circular bits of DNA often found in bacteria that transmit genetic information between microbes, spurring evolution and perpetuating antibiotic resistance.

Once trained, the team pitted Evo against other AI models to predict how mutations in a given genetic sequence might impact the sequence’s function, such as coding for proteins. Even though it was never told which genetic letters form codons, Evo outperformed an AI model explicitly trained to recognize protein-coding DNA letters on the task.

Remarkably, Evo also predicted the effect of mutations on a wide variety of RNA molecules—for example, those regulating gene expression, shuttling protein building blocks to the cell’s protein-making factory, and acting as enzymes to fine-tune protein function.

Evo seemed to have gained a “fundamental understanding of DNA grammar,” wrote Theodoris, making it a perfect tool to create “meaningful” new genetic code.

To test this, the team used the AI to design new versions of the gene editing tool CRISPR. The task is especially difficult as the system contains two elements that work together—a guide RNA molecule and a pair of protein “scissors” called Cas. Evo generated millions of potential Cas proteins and their accompanying guide RNA. The team picked 11 of the most promising combinations, synthesized them in the lab, and tested their activity in test tubes.

One stood out. A variant of Cas9, the AI-designed protein cleaved its DNA target when paired with its guide RNA partner.  These designer biomolecules represent the “first examples” of codesign between proteins and DNA or RNA with a language model, wrote the team.

The team also asked Evo to generate a DNA sequence similar in length to some bacterial genomes and compared the results to natural genomes. The designer genome contained some essential genes for cell survival, but with myriad unnatural characteristics preventing it from being functional. This suggests the AI can only make a “blurry image” of a genome, one that contains key elements, but lacks finer-grained details, wrote the team.

Like other LLMs, Evo sometimes “hallucinates,” spewing CRISPR systems with no chance of working. Despite the problems, the AI suggests future LLMs could predict and generate genomes on a broader scale. The tool could also help scientists examine long-range genetic interactions in microbes and phages, potentially sparking insights into how we might rewire their genomes to produce biofuels, plastic-eating bugs, or medicines.

It’s yet unclear whether Evo could decipher or generate far longer genomes, like those in plants, animals, or humans. If the model can scale, however, it “would have tremendous diagnostic and therapeutic implications for disease,” wrote Theodoris.

Image Credit: Warren Umoh on Unsplash

]]>
159515
MIT’s New Robot Dog Learned to Walk and Climb in a Simulation Whipped Up by Generative AI https://singularityhub.com/2024/11/15/mits-new-robot-dog-learned-to-walk-and-climb-in-a-simulation-whipped-up-by-generative-ai/ Fri, 15 Nov 2024 23:09:02 +0000 https://singularityhub.com/?p=159498 A big challenge when training AI models to control robots is gathering enough realistic data. Now, researchers at MIT have shown they can train a robot dog using 100 percent synthetic data.

Traditionally, robots have been hand-coded to perform particular tasks, but this approach results in brittle systems that struggle to cope with the uncertainty of the real world. Machine learning approaches that train robots on real-world examples promise to create more flexible machines, but gathering enough training data is a significant challenge.

One potential workaround is to train robots using computer simulations of the real world, which makes it far simpler to set up novel tasks or environments for them. But this approach is bedeviled by the “sim-to-real gap”—these virtual environments are still poor replicas of the real world and skills learned inside them often don’t translate.

Now, MIT CSAIL researchers have found a way to combine simulations and generative AI to enable a robot, trained on zero real-world data, to tackle a host of challenging locomotion tasks in the physical world.

“One of the main challenges in sim-to-real transfer for robotics is achieving visual realism in simulated environments,” Shuran Song from Stanford University, who wasn’t involved in the research, said in a press release from MIT.

“The LucidSim framework provides an elegant solution by using generative models to create diverse, highly realistic visual data for any simulation. This work could significantly accelerate the deployment of robots trained in virtual environments to real-world tasks.”

Leading simulators used to train robots today can realistically reproduce the kind of physics robots are likely to encounter. But they are not so good at recreating the diverse environments, textures, and lighting conditions found in the real world. This means robots relying on visual perception often struggle in less controlled environments.

To get around this, the MIT researchers used text-to-image generators to create realistic scenes and combined these with a popular simulator called MuJoCo to map geometric and physics data onto the images. To increase the diversity of images, the team also used ChatGPT to create thousands of prompts for the image generator covering a huge range of environments.

After generating these realistic environmental images, the researchers converted them into short videos from a robot’s perspective using another system they developed called Dreams in Motion. This computes how each pixel in the image would shift as the robot moves through an environment, creating multiple frames from a single image.

The researchers dubbed this data-generation pipeline LucidSim and used it to train an AI model to control a quadruped robot using just visual input. The robot learned a series of locomotion tasks, including going up and down stairs, climbing boxes, and chasing a soccer ball.

The training process was split into parts. First, the team trained their model on data generated by an expert AI system with access to detailed terrain information as it attempted the same tasks. This gave the model enough understanding of the tasks to attempt them in a simulation based on the data from LucidSim, which generated more data. They then re-trained the model on the combined data to create the final robotic control policy.

The approach matched or outperformed the expert AI system on four out of the five tasks in real-world tests, despite relying on just visual input. And on all the tasks, it significantly outperformed a model trained using “domain randomization”—a leading simulation approach that increases data diversity by applying random colors and patterns to objects in the environment.

The researchers told MIT Technology Review their next goal is to train a humanoid robot on purely synthetic data generated by LucidSim. They also hope to use the approach to improve the training of robotic arms on tasks requiring dexterity.

Given the insatiable appetite for robot training data, methods like this that can provide high-quality synthetic alternatives are likely to become increasingly important in the coming years.

Image Credit: MIT CSAIL

]]>
159498