I write this blog post to explore an emerging frontier in the way we consume information – one that will have profound implications for the way we read, learn, and interact with complex content. This new frontier is heralded by the advent of large language models (LLMs), powerful generative AI that can understand and generate text, answer questions, and translate languages.
On June 10, 2023, I participated in a private gathering in Silicon Valley hosted by AI experts Richard Socher and Nancy Xu. This event brought together some of the brightest minds in artificial intelligence and leaders in the tech industry, including notable figures such as Google co-founder Sergey Brin and former Twitter CEO & CTO Parag Aggarwal.
There, I discussed my vision on how interactive, LLM (Large Language Model)-powered AI agents will revolutionize the way we interact with content. It will include everything from reading scientific papers and books to interpreting images and watching videos. In this blog post, I share my vision with you, exploring the potential of these AI agents to transform our information consumption and understanding.
As someone who has spent most of my career leading product and engineering at media organizations including The New York Times, The Wall Street Journal, Conde Nast, and Hearst, I’ve developed a deep appreciation for how critical high-quality information is to our society. I’ve seen first-hand how the digitization of content, from the rise of the internet to the creation of multimedia platforms, has transformed the way we engage with information. In my work applying machine learning to numerous use cases in the media industry since my product engineering days at Knight Ridder in the 1990s, I’ve also seen the challenges and limitations of how readers consume content. Having had to trudge through complex documents in both professional and personal capacities, I am excited about how LLMs can change our interaction with information. I’ve also been an investor in and advisor to the AI startup you.com since 2021 from before ChatGPT popularized LLMs to the mainstream.
Elevating Textual Comprehension with LLM-Powered Agents
LLMs can usher in a paradigm shift by turning passive documents into interactive experiences. Let’s expand on the examples we’ve discussed earlier and explore a wider array of potential uses:
- Explaining complex content: LLM-powered agents can translate complex language from a scientific paper into simple, everyday language.
- Personalized extraction and expansion: These agents can highlight and expand on the parts of a paper or a book that a reader finds most interesting or most relevant to their purpose of reading.
- Making complex documents accessible: By providing explanations, summaries, and simplified versions, LLMs can make complex technical, financial, scientific, and other complex documents understandable to a layperson.
- Providing historical and cultural context: Reading historical texts or foreign literature? LLMs can provide relevant historical, social, and cultural context, enriching your understanding of the text.
- Augmenting interactive reading: While reading a novel, you could ask the LLM-powered agent about the author’s style, themes in their works, or real-world analogs to the plot points.
- Assistance with language learning: LLMs can help language learners read texts in a new language, providing translations, explanations of idioms, and cultural nuances on-demand.
- Real-time fact-checking: As you read a news article or a blog post, the agent could verify the facts and provide additional perspectives or data on the subject.
- Supporting legal and regulatory comprehension: Legal and regulatory documents can be tough to navigate. LLM-powered agents can make these documents more accessible by explaining terms, summarizing sections, and relating clauses to relevant laws or precedents.
- Exploring author influences: While reading a piece of literature, the LLM agent can help you explore the influences on the author from other works of literature or from historical events.
- Simulating author interaction: Ever wanted to ask an author about their thoughts while writing a certain section or what they might have done differently? While it can’t replace actual interaction with the author, an LLM agent can generate plausible responses based on their other writings.
Exploring Multimedia Applications
LLMs are multimodal, i.e. not limited to just text, they can also interact with and explain content from various media:
- Interpreting art: From explaining the history and context of a painting to exploring the techniques used by the artist, an LLM agent can enrich the experience of viewing art.
- Decoding symbolism in movies: LLMs can assist viewers in understanding complex films by explaining underlying themes, symbolism, or cinematographic techniques.
- Understanding infographics: They can aid in understanding complex infographics by breaking them down and explaining each part in simple language and letting you query and interact with it in natural human language.
- Interactive cookbook: In a recipe, the LLM agent can explain culinary terms, offer substitutes for ingredients, or even suggest adjustments based on dietary preferences and available tools and ingredients.
- Music appreciation: When listening to a piece of music, the LLM agent could provide relevant information about the composer, the context in which it was composed, or the music theory behind it.
LLM-Powered Agents: A Revolution in Information Consumption
The potential of LLMs to revolutionize the way we consume information is immense. They are not just about making information easier to access or understand but about fundamentally changing how we interact with information – transforming it into a dialogue instead of a monologue.
In my own reading experiences, I’ve often wished for the ability to delve deeper into certain topics, to understand complex concepts more easily, or to have a guide that would enrich my comprehension with relevant context. These are not just my desires but reflect the limitations of our current modes of information consumption.
LLM powered intelligent agents can be this guide. They can help us navigate the oceans of information that we encounter daily, not just by making it more accessible but by making it more meaningful. They can tailor explanations to our understanding, bring in context that enriches our comprehension, and make the process of learning more engaging.
As a learning experiment, I developed and open-sourced my own such agent, Ragbot.AI, an augmented brain and assistant that uses multiple LLMs along with a technique called Retrieval Augmented Generation (RAG) to enable its human users to interact with documents.
In the future, we will all have our own personal LLM-powered agents – our superhuman teachers with near-infinite knowledge and patience. And they will transform the way we consume information and learn from documents. I’m excited to be part of this revolution.