test page under construction

Just a simple page. Artificial intelligence - Wikipedia Jump to content

Artificial intelligence

Page semi-protected
From Wikipedia, the free encyclopedia

Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.[1]

High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., language models and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[2][3]

Various subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, perception, and support for robotics.[a] To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[4] Some companies, such as OpenAI, Google DeepMind and Meta,[5] aim to create artificial general intelligence (AGI)—AI that can complete virtually any cognitive task at least as well as a human.

Artificial intelligence was founded as an academic discipline in 1956,[6] and the field went through multiple cycles of optimism throughout its history,[7][8] followed by periods of disappointment and loss of funding, known as AI winters.[9][10] Funding and interest vastly increased after 2012 when graphics processing units started being used to accelerate neural networks and deep learning outperformed previous AI techniques.[11] This growth accelerated further after 2017 with the transformer architecture.[12] In the 2020s, an ongoing period of rapid progress in advanced generative AI became known as the AI boom. Generative AI's ability to create and modify content has led to several unintended consequences and harms, which has raised ethical concerns about AI's long-term effects and potential existential risks, prompting discussions about regulatory policies to ensure the safety and benefits of the technology.

Goals

The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.[a]

Reasoning and problem-solving

Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions.[13] By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.[14]

Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow.[15] Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.[16] Accurate and efficient reasoning is an unsolved problem.

Knowledge representation

An ontology represents knowledge as a set of concepts within a domain and the relationships between those concepts.

Knowledge representation and knowledge engineering[17] allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval,[18] scene interpretation,[19] clinical decision support,[20] knowledge discovery (mining "interesting" and actionable inferences from large databases),[21] and other areas.[22]

A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge.[23] Knowledge bases need to represent things such as objects, properties, categories, and relations between objects;[24] situations, events, states, and time;[25] causes and effects;[26] knowledge about knowledge (what we know about what other people know);[27] default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing);[28] and many other aspects and domains of knowledge.

Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous);[29] and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally).[16] There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.[c]

Planning and decision-making

An "agent" is anything that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen.[d][32] In automated planning, the agent has a specific goal.[33] In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.[34]

In classical planning, the agent knows exactly what the effect of any action will be.[35] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[36]

In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences.[37] Information value theory can be used to weigh the value of exploratory or experimental actions.[38] The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be.

A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned.[39]

Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents.[40]

Learning

Machine learning is the study of programs that can improve their performance on a given task automatically.[41] It has been a part of AI from the beginning.[e]

In supervised learning, the training data is labelled with the expected answers, while in unsupervised learning, the model identifies patterns or structures in unlabelled data.

There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance.[44] Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).[45]

In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good".[46] Transfer learning is when the knowledge gained from one problem is applied to a new problem.[47] Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning.[48]

Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization.[49]

Natural language processing

Natural language processing (NLP) allows programs to read, write and communicate in human languages.[50] Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering.[51]

Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation[f] unless restricted to small domains called "micro-worlds" (due to the common sense knowledge problem[29]). Margaret Masterman believed that it was meaning and not grammar that was the key to understanding languages, and that thesauri and not dictionaries should be the basis of computational language structure.

Modern deep learning techniques for NLP include word embedding (representing words, typically as vectors encoding their meaning),[52] transformers (a deep learning architecture using an attention mechanism),[53] and others.[54] In 2019, generative pre-trained transformer (or "GPT") language models began to generate coherent text,[55][56] and by 2023, these models were able to get human-level scores on the bar exam, SAT test, GRE test, and many other real-world applications.[57]

Perception

Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.[58]

The field includes speech recognition,[59] image classification,[60] facial recognition, object recognition,[61] object tracking,[62] and robotic perception.[63]

Social intelligence

Kismet, a robot head which was made in the 1990s; it is a machine that can recognize and simulate emotions.[64]

Affective computing is a field that comprises systems that recognize, interpret, process, or simulate human feeling, emotion, and mood.[65] For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate human–computer interaction.

However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents.[66] Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the effects displayed by a videotaped subject.[67]

General intelligence

A machine with artificial general intelligence would be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.[68]

Techniques

AI research uses a wide variety of techniques to accomplish the goals above.[b]

Search and optimization

AI can solve many problems by intelligently searching through many possible solutions.[69] There are two very different kinds of search used in AI: state space search and local search.

State space search searches through a tree of possible states to try to find a goal state.[70] For example, planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis.[71]

Simple exhaustive searches[72] are rarely sufficient for most real-world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes.[15] "Heuristics" or "rules of thumb" can help prioritize choices that are more likely to reach a goal.[73]

Adversarial search is used for game-playing programs, such as chess or Go. It searches through a tree of possible moves and countermoves, looking for a winning position.[74]

Illustration of gradient descent for 3 different starting points; two parameters (represented by the plan coordinates) are adjusted in order to minimize the loss function (the height)

Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.[75]

Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks,[76] through the backpropagation algorithm.

Another type of local search is evolutionary computation, which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, selecting only the fittest to survive each generation.[77]

Distributed search processes can coordinate via swarm intelligence algorithms. Two popular swarm algorithms used in search are particle swarm optimization (inspired by bird flocking) and ant colony optimization (inspired by ant trails).[78]

Logic

Formal logic is used for reasoning and knowledge representation.[79] Formal logic comes in two main forms: propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies")[80] and predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").[81]

Deductive reasoning in logic is the process of proving a new statement (conclusion) from other statements that are given and assumed to be true (the premises).[82] Proofs can be structured as proof trees, in which nodes are labelled by sentences, and children nodes are connected to parent nodes by inference rules.

Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose leaf nodes are labelled by premises or axioms. In the case of Horn clauses, problem-solving search can be performed by reasoning forwards from the premises or backwards from the problem.[83] In the more general case of the clausal form of first-order logic, resolution is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.[84]

Inference in both Horn clause logic and first-order logic is undecidable, and therefore intractable. However, backward reasoning with Horn clauses, which underpins computation in the logic programming language Prolog, is Turing complete. Moreover, its efficiency is competitive with computation in other symbolic programming languages.[85]

Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.[86]

Non-monotonic logics, including logic programming with negation as failure, are designed to handle default reasoning.[28] Other specialized versions of logic have been developed to describe many complex domains.

Probabilistic methods for uncertain reasoning

A simple Bayesian network, with the associated conditional probability tables

Many problems in AI (including reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics.[87] Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using decision theory, decision analysis,[88] and information value theory.[89] These tools include models such as Markov decision processes,[90] dynamic decision networks,[91] game theory and mechanism design.[92]

Bayesian networks[93] are a tool that can be used for reasoning (using the Bayesian inference algorithm),[g][95] learning (using the expectation–maximization algorithm),[h][97] planning (using decision networks)[98] and perception (using dynamic Bayesian networks).[91]

Probabilistic algorithms can also be used for filtering, prediction, smoothing, and finding explanations for streams of data, thus helping perception systems analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).[91]

Expectation–maximization clustering of Old Faithful eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption.

Classifiers and statistical learning methods

The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. Classifiers[99] are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.[45]

There are many kinds of classifiers in use.[100] The decision tree is the simplest and most widely used symbolic machine learning algorithm.[101] K-nearest neighbor algorithm was the most widely used analogical AI until the mid-1990s, and Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.[102] The naive Bayes classifier is reportedly the "most widely used learner"[103] at Google, due in part to its scalability.[104] Neural networks are also used as classifiers.[105]

Artificial neural networks

A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain.

An artificial neural network is based on a collection of nodes also known as artificial neurons, which loosely model the neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.[105]

Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[106] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[107]

In feedforward neural networks the signal passes in only one direction.[108] The term perceptron typically refers to a single-layer neural network.[109] In contrast, deep learning uses many layers.[110] Recurrent neural networks (RNNs) feed the output signal back into the input, which allows short-term memories of previous input events. Long short-term memory networks (LSTMs) are recurrent neural networks that better preserve longterm dependencies and are less sensitive to the vanishing gradient problem.[111] Convolutional neural networks (CNNs) use layers of kernels to more efficiently process local patterns. This local processing is especially important in image processing, where the early CNN layers typically identify simple local patterns such as edges and curves, with subsequent layers detecting more complex patterns like textures, and eventually whole objects.[112]

Deep learning

Deep learning is a subset of machine learning, which is itself a subset of artificial intelligence.[113]

Deep learning uses several layers of neurons between the network's inputs and outputs.[110] The multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits, letters, or faces.[114]

Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including computer vision, speech recognition, natural language processing, image classification,[115] and others. The reason that deep learning performs so well in so many applications is not known as of 2021.[116] The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s)[i] but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.[j]

GPT

Generative pre-trained transformers (GPT) are large language models (LLMs) that generate text based on the semantic relationships between words in sentences. Text-based GPT models are pre-trained on a large corpus of text that can be from the Internet. The pretraining consists of predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pretraining, GPT models accumulate knowledge about the world and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful, and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are prone to generating falsehoods called "hallucinations". These can be reduced with RLHF and quality data, but the problem has been getting worse for reasoning systems.[124] Such systems are used in chatbots, which allow people to ask a question or request a task in simple text.[125][126]

Current models and services include ChatGPT, Claude, Gemini, Copilot, and Meta AI.[127] Multimodal GPT models can process different types of data (modalities) such as images, videos, sound, and text.[128]

Hardware and software

In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training.[129] Specialized programming languages such as Prolog were used in early AI research,[130] but general-purpose programming languages like Python have become predominant.[131]

The transistor density in integrated circuits has been observed to roughly double every 18 months—a trend known as Moore's law, named after the Intel co-founder Gordon Moore, who first identified it. Improvements in GPUs have been even faster,[132] a trend sometimes called Huang's law,[133] named after Nvidia co-founder and CEO Jensen Huang.

Applications

AI and machine learning technology is used in most of the essential applications of the 2020s, including: search engines (such as Google Search), targeting online advertisements, recommendation systems (offered by Netflix, YouTube or Amazon), driving internet traffic, targeted advertising (AdSense, Facebook), virtual assistants (such as Siri or Alexa), autonomous vehicles (including drones, ADAS and self-driving cars), automatic language translation (Microsoft Translator, Google Translate), facial recognition (Apple's FaceID or Microsoft's DeepFace and Google's FaceNet) and image labeling (used by Facebook, Apple's Photos and TikTok). The deployment of AI may be overseen by a chief automation officer (CAO).

Health and medicine

The application of AI in medicine and medical research has the potential to increase patient care and quality of life.[134] Through the lens of the Hippocratic Oath, medical professionals are ethically compelled to use AI, if applications can more accurately diagnose and treat patients.[135][136]

For medical research, AI is an important tool for processing and integrating big data. This is particularly important for organoid and tissue engineering development which use microscopy imaging as a key technique in fabrication.[137] It has been suggested that AI can overcome discrepancies in funding allocated to different fields of research.[137][138] New AI tools can deepen the understanding of biomedically relevant pathways. For example, AlphaFold 2 (2021) demonstrated the ability to approximate, in hours rather than months, the 3D structure of a protein.[139] In 2023, it was reported that AI-guided drug discovery helped find a class of antibiotics capable of killing two different types of drug-resistant bacteria.[140] In 2024, researchers used machine learning to accelerate the search for Parkinson's disease drug treatments. Their aim was to identify compounds that block the clumping, or aggregation, of alpha-synuclein (the protein that characterises Parkinson's disease). They were able to speed up the initial screening process ten-fold and reduce the cost by a thousand-fold.[141][142]

Games

Game playing programs have been used since the 1950s to demonstrate and test AI's most advanced techniques.[143] Deep Blue became the first computer chess-playing system to beat a reigning world chess champion, Garry Kasparov, on 11 May 1997.[144] In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin.[145] In March 2016, AlphaGo won 4 out of 5 games of Go in a match with Go champion Lee Sedol, becoming the first computer Go-playing system to beat a professional Go player without handicaps. Then, in 2017, it defeated Ke Jie, who was the best Go player in the world.[146] Other programs handle imperfect-information games, such as the poker-playing program Pluribus.[147] DeepMind developed increasingly generalistic reinforcement learning models, such as with MuZero, which could be trained to play chess, Go, or Atari games.[148] In 2019, DeepMind's AlphaStar achieved grandmaster level in StarCraft II, a particularly challenging real-time strategy game that involves incomplete knowledge of what happens on the map.[149] In 2021, an AI agent competed in a PlayStation Gran Turismo competition, winning against four of the world's best Gran Turismo drivers using deep reinforcement learning.[150] In 2024, Google DeepMind introduced SIMA, a type of AI capable of autonomously playing nine previously unseen open-world video games by observing screen output, as well as executing short, specific tasks in response to natural language instructions.[151]

Mathematics

Large language models, such as GPT-4, Gemini, Claude, Llama or Mistral, are increasingly used in mathematics. These probabilistic models are versatile, but can also produce wrong answers in the form of hallucinations. They sometimes need a large database of mathematical problems to learn from, but also methods such as supervised fine-tuning[152] or trained classifiers with human-annotated data to improve answers for new problems and learn from corrections.[153] A February 2024 study showed that the performance of some language models for reasoning capabilities in solving math problems not included in their training data was low, even for problems with only minor deviations from trained data.[154] One technique to improve their performance involves training the models to produce correct reasoning steps, rather than just the correct result.[155] The Alibaba Group developed a version of its Qwen models called Qwen2-Math, that achieved state-of-the-art performance on several mathematical benchmarks, including 84% accuracy on the MATH dataset of competition mathematics problems.[156] In January 2025, Microsoft proposed the technique rStar-Math that leverages Monte Carlo tree search and step-by-step reasoning, enabling a relatively small language model like Qwen-7B to solve 53% of the AIME 2024 and 90% of the MATH benchmark problems.[157]

Alternatively, dedicated models for mathematical problem solving with higher precision for the outcome including proof of theorems have been developed such as AlphaTensor, AlphaGeometry, AlphaProof and AlphaEvolve[158] all from Google DeepMind,[159] Llemma from EleutherAI[160] or Julius.[161]

When natural language is used to describe mathematical problems, converters can transform such prompts into a formal language such as Lean to define mathematical tasks. The experimental model Gemini Deep Think accepts natural language prompts directly and achieved gold medal results in the International Math Olympiad of 2025.[162]

Some models have been developed to solve challenging problems and reach good results in benchmark tests, others to serve as educational tools in mathematics.[163]

Topological deep learning integrates various topological approaches.

Finance

Finance is one of the fastest growing sectors where applied AI tools are being deployed: from retail online banking to investment advice and insurance, where automated "robot advisers" have been in use for some years.[164]

According to Nicolas Firzli, director of the World Pensions & Investments Forum, it may be too early to see the emergence of highly innovative AI-informed financial products and services. He argues that "the deployment of AI tools will simply further automatise things: destroying tens of thousands of jobs in banking, financial planning, and pension advice in the process, but I'm not sure it will unleash a new wave of [e.g., sophisticated] pension innovation."[165]

Military

Various countries are deploying AI military applications.[166] The main applications enhance command and control, communications, sensors, integration and interoperability.[167] Research is targeting intelligence collection and analysis, logistics, cyber operations, information operations, and semiautonomous and autonomous vehicles.[166] AI technologies enable coordination of sensors and effectors, threat detection and identification, marking of enemy positions, target acquisition, coordination and deconfliction of distributed Joint Fires between networked combat vehicles, both human-operated and autonomous.[167]

AI has been used in military operations in Iraq, Syria, Israel and Ukraine.[166][168][169][170]

Generative AI

Vincent van Gogh in watercolour created by generative AI software

Generative artificial intelligence (Generative AI, GenAI,[171] or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data.[172][173][174] These models learn the underlying patterns and structures of their training data and use them to produce new data[175][176] based on the input, which often comes in the form of natural language prompts.[177][178]

Generative AI tools have become more common since the AI boom in the 2020s. This boom was made possible by improvements in transformer-based deep neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, Copilot, Gemini, Claude, Grok, and DeepSeek; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTXV and Sora.[179][180][181][182][183] Technology companies developing generative AI include OpenAI, Anthropic, Meta AI, Microsoft, Google, DeepSeek, and Baidu.[177][184][185]

Generative AI has raised many ethical questions and governance challenges as it can be used for cybercrime, or to deceive or manipulate people through fake news or deepfakes.[186][187] Even if used ethically, it may lead to mass replacement of human jobs.[188] The tools themselves have been criticized as violating intellectual property laws, since they are trained on copyrighted works.[189]

Agents

AI agents are software entities designed to perceive their environment, make decisions, and take actions autonomously to achieve specific goals. These agents can interact with users, their environment, or other agents. AI agents are used in various applications, including virtual assistants, chatbots, autonomous vehicles, game-playing systems, and industrial robotics. AI agents operate within the constraints of their programming, available computational resources, and hardware limitations. This means they are restricted to performing tasks within their defined scope and have finite memory and processing capabilities. In real-world applications, AI agents often face time constraints for decision-making and action execution. Many AI agents incorporate learning algorithms, enabling them to improve their performance over time through experience or training. Using machine learning, AI agents can adapt to new situations and optimise their behaviour for their designated tasks.[190][191][192]

Sexuality

Applications of AI in this domain include AI-enabled menstruation and fertility trackers that analyze user data to offer predictions,[193] AI-integrated sex toys (e.g., teledildonics),[194] AI-generated sexual education content,[195] and AI agents that simulate sexual and romantic partners (e.g., Replika).[196] AI is also used for the production of non-consensual deepfake pornography, raising significant ethical and legal concerns.[197]

AI technologies have also been used to attempt to identify online gender-based violence and online sexual grooming of minors.[198][199]

Other industry-specific tasks

There are also thousands of successful AI applications used to solve specific problems for specific industries or institutions. In a 2017 survey, one in five companies reported having incorporated "AI" in some offerings or processes.[200] A few examples are energy storage, medical diagnosis, military logistics, applications that predict the result of judicial decisions, foreign policy, or supply chain management.

AI applications for evacuation and disaster management are growing. AI has been used to investigate patterns in large-scale and small-scale evacuations using historical data from GPS, videos or social media. Furthermore, AI can provide real-time information on the evacuation conditions.[201][202][203]

In agriculture, AI has helped farmers to increase yield and identify areas that need irrigation, fertilization, pesticide treatments. Agronomists use AI to conduct research and development. AI has been used to predict the ripening time for crops such as tomatoes, monitor soil moisture, operate agricultural robots, conduct predictive analytics, classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and save water.

Artificial intelligence is used in astronomy to analyze increasing amounts of available data and applications, mainly for "classification, regression, clustering, forecasting, generation, discovery, and the development of new scientific insights." For example, it is used for discovering exoplanets, forecasting solar activity, and distinguishing between signals and instrumental effects in gravitational wave astronomy. Additionally, it could be used for activities in space, such as space exploration, including the analysis of data from space missions, real-time science decisions of spacecraft, space debris avoidance, and more autonomous operation.

During the 2024 Indian elections, US$50 million was spent on authorized AI-generated content, notably by creating deepfakes of allied (including sometimes deceased) politicians to better engage with voters, and by translating speeches to various local languages.[204]

Ethics

Street art in Tel Aviv[205][206]

AI has potential benefits and potential risks.[207] AI may be able to advance science and find solutions for serious problems: Demis Hassabis of DeepMind hopes to "solve intelligence, and then use that to solve everything else".[208] However, as the use of AI has become widespread, several unintended consequences and risks have been identified.[209][210] In-production systems can sometimes not factor ethics and bias into their AI training processes, especially when the AI algorithms are inherently unexplainable in deep learning.[211]

Risks and harm

Machine learning algorithms require large amounts of data. The techniques used to acquire this data have raised concerns about privacy, surveillance and copyright.

AI-powered devices and services, such as virtual assistants and IoT products, continuously collect personal information, raising concerns about intrusive data gathering and unauthorized access by third parties. The loss of privacy is further exacerbated by AI's ability to process and combine vast amounts of data, potentially leading to a surveillance society where individual activities are constantly monitored and analyzed without adequate safeguards or transparency.

Sensitive user data collected may include online activity records, geolocation data, video, or audio.[212] For example, in order to build speech recognition algorithms, Amazon has recorded millions of private conversations and allowed temporary workers to listen to and transcribe some of them.[213] Opinions about this widespread surveillance range from those who see it as a necessary evil to those for whom it is clearly unethical and a violation of the right to privacy.[214]

AI developers argue that this is the only way to deliver valuable applications and have developed several techniques that attempt to preserve privacy while still obtaining the data, such as data aggregation, de-identification and differential privacy.[215] Since 2016, some privacy experts, such as Cynthia Dwork, have begun to view privacy in terms of fairness. Brian Christian wrote that experts have pivoted "from the question of 'what they know' to the question of 'what they're doing with it'."[216]

Generative AI is often trained on unlicensed copyrighted works, including in domains such as images or computer code; the output is then used under the rationale of "fair use". Experts disagree about how well and under what circumstances this rationale will hold up in courts of law; relevant factors may include "the purpose and character of the use of the copyrighted work" and "the effect upon the potential market for the copyrighted work".[217][218] Website owners who do not wish to have their content scraped can indicate it in a "robots.txt" file.[219] In 2023, leading authors (including John Grisham and Jonathan Franzen) sued AI companies for using their work to train generative AI.[220][221] Another discussed approach is to envision a separate sui generis system of protection for creations generated by AI to ensure fair attribution and compensation for human authors.[222]

Dominance by tech giants

The commercial AI scene is dominated by Big Tech companies such as Alphabet Inc., Amazon, Apple Inc., Meta Platforms, and Microsoft.[223][224][225] Some of these players already own the vast majority of existing cloud infrastructure and computing power from data centers, allowing them to entrench further in the marketplace.[226][227]

Power needs and environmental impacts

In January 2024, the International Energy Agency (IEA) released Electricity 2024, Analysis and Forecast to 2026, forecasting electric power use.[228] This is the first IEA report to make projections for data centers and power consumption for artificial intelligence and cryptocurrency. The report states that power demand for these uses might double by 2026, with additional electric power usage equal to electricity used by the whole Japanese nation.[229]

Prodigious power consumption by AI is responsible for the growth of fossil fuel use, and might delay closings of obsolete, carbon-emitting coal energy facilities. There is a feverish rise in the construction of data centers throughout the US, making large technology firms (e.g., Microsoft, Meta, Google, Amazon) into voracious consumers of electric power. Projected electric consumption is so immense that there is concern that it will be fulfilled no matter the source. A ChatGPT search involves the use of 10 times the electrical energy as a Google search. The large firms are in haste to find power sources – from nuclear energy to geothermal to fusion. The tech firms argue that – in the long view – AI will be eventually kinder to the environment, but they need the energy now. AI makes the power grid more efficient and "intelligent", will assist in the growth of nuclear power, and track overall carbon emissions, according to technology firms.[230]

A 2024 Goldman Sachs Research Paper, AI Data Centers and the Coming US Power Demand Surge, found "US power demand (is) likely to experience growth not seen in a generation...." and forecasts that, by 2030, US data centers will consume 8% of US power, as opposed to 3% in 2022, presaging growth for the electrical power generation industry by a variety of means.[231] Data centers' need for more and more electrical power is such that they might max out the electrical grid. The Big Tech companies counter that AI can be used to maximize the utilization of the grid by all.[232]

In 2024, the Wall Street Journal reported that big AI companies have begun negotiations with the US nuclear power providers to provide electricity to the data centers. In March 2024 Amazon purchased a Pennsylvania nuclear-powered data center for US$650 million.[233] Nvidia CEO Jensen Huang said nuclear power is a good option for the data centers.[234]

In September 2024, Microsoft announced an agreement with Constellation Energy to re-open the Three Mile Island nuclear power plant to provide Microsoft with 100% of all electric power produced by the plant for 20 years. Reopening the plant, which suffered a partial nuclear meltdown of its Unit 2 reactor in 1979, will require Constellation to get through strict regulatory processes which will include extensive safety scrutiny from the US Nuclear Regulatory Commission. If approved (this will be the first ever US re-commissioning of a nuclear plant), over 835 megawatts of power – enough for 800,000 homes – of energy will be produced. The cost for re-opening and upgrading is estimated at US$1.6 billion and is dependent on tax breaks for nuclear power contained in the 2022 US Inflation Reduction Act.[235] The US government and the state of Michigan are investing almost US$2 billion to reopen the Palisades Nuclear reactor on Lake Michigan. Closed since 2022, the plant is planned to be reopened in October 2025. The Three Mile Island facility will be renamed the Crane Clean Energy Center after Chris Crane, a nuclear proponent and former CEO of Exelon who was responsible for Exelon's spinoff of Constellation.[236]

After the last approval in September 2023, Taiwan suspended the approval of data centers north of Taoyuan with a capacity of more than 5 MW in 2024, due to power supply shortages.[237] Taiwan aims to phase out nuclear power by 2025.[237] On the other hand, Singapore imposed a ban on the opening of data centers in 2019 due to electric power, but in 2022, lifted this ban.[237]

Although most nuclear plants in Japan have been shut down after the 2011 Fukushima nuclear accident, according to an October 2024 Bloomberg article in Japanese, cloud gaming services company Ubitus, in which Nvidia has a stake, is looking for land in Japan near nuclear power plant for a new data center for generative AI.[238] Ubitus CEO Wesley Kuo said nuclear power plants are the most efficient, cheap and stable power for AI.[238]

On 1 November 2024, the Federal Energy Regulatory Commission (FERC) rejected an application submitted by Talen Energy for approval to supply some electricity from the nuclear power station Susquehanna to Amazon's data center.[239] According to the Commission Chairman Willie L. Phillips, it is a burden on the electricity grid as well as a significant cost shifting concern to households and other business sectors.[239]

In 2025, a report prepared by the International Energy Agency estimated the greenhouse gas emissions from the energy consumption of AI at 180 million tons. By 2035, these emissions could rise to 300–500 million tonnes depending on what measures will be taken. This is below 1.5% of the energy sector emissions. The emissions reduction potential of AI was estimated at 5% of the energy sector emissions, but rebound effects (for example if people switch from public transport to autonomous cars) can reduce it.[240]

Misinformation

YouTube, Facebook and others use recommender systems to guide users to more content. These AI programs were given the goal of maximizing user engagement (that is, the only goal was to keep people watching). The AI learned that users tended to choose misinformation, conspiracy theories, and extreme partisan content, and, to keep them watching, the AI recommended more of it. Users also tended to watch more content on the same subject, so the AI led people into filter bubbles where they received multiple versions of the same misinformation.[241] This convinced many users that the misinformation was true, and ultimately undermined trust in institutions, the media and the government.[242] The AI program had correctly learned to maximize its goal, but the result was harmful to society. After the U.S. election in 2016, major technology companies took some steps to mitigate the problem.[243]

In the early 2020s, generative AI began to create images, audio, and texts that are virtually indistinguishable from real photographs, recordings, or human writing,[244] while realistic AI-generated videos became feasible in the mid-2020s.[245][246][247] It is possible for bad actors to use this technology to create massive amounts of misinformation or propaganda;[248] one such potential malicious use is deepfakes for computational propaganda.[249] AI pioneer Geoffrey Hinton expressed concern about AI enabling "authoritarian leaders to manipulate their electorates" on a large scale, among other risks.[250]

AI researchers at Microsoft, OpenAI, universities and other organisations have suggested using "personhood credentials" as a way to overcome online deception enabled by AI models.[251]

Algorithmic bias and fairness

Machine learning applications will be biased[k] if they learn from biased data.[253] The developers may not be aware that the bias exists.[254] Bias can be introduced by the way training data is selected and by the way a model is deployed.[255][253] If a biased algorithm is used to make decisions that can seriously harm people (as it can in medicine, finance, recruitment, housing or policing) then the algorithm may cause discrimination.[256] The field of fairness studies how to prevent harms from algorithmic biases.

On June 28, 2015, Google Photos's new image labeling feature mistakenly identified Jacky Alcine and a friend as "gorillas" because they were black. The system was trained on a dataset that contained very few images of black people,[257] a problem called "sample size disparity".[258] Google "fixed" this problem by preventing the system from labelling anything as a "gorilla". Eight years later, in 2023, Google Photos still could not identify a gorilla, and neither could similar products from Apple, Facebook, Microsoft and Amazon.[259]

COMPAS is a commercial program widely used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. In 2016, Julia Angwin at ProPublica discovered that COMPAS exhibited racial bias, despite the fact that the program was not told the races of the defendants. Although the error rate for both whites and blacks was calibrated equal at exactly 61%, the errors for each race were different—the system consistently overestimated the chance that a black person would re-offend and would underestimate the chance that a white person would not re-offend.[260] In 2017, several researchers[l] showed that it was mathematically impossible for COMPAS to accommodate all possible measures of fairness when the base rates of re-offense were different for whites and blacks in the data.[262]

A program can make biased decisions even if the data does not explicitly mention a problematic feature (such as "race" or "gender"). The feature will correlate with other features (like "address", "shopping history" or "first name"), and the program will make the same decisions based on these features as it would on "race" or "gender".[263] Moritz Hardt said "the most robust fact in this research area is that fairness through blindness doesn't work."[264]

Criticism of COMPAS highlighted that machine learning models are designed to make "predictions" that are only valid if we assume that the future will resemble the past. If they are trained on data that includes the results of racist decisions in the past, machine learning models must predict that racist decisions will be made in the future. If an application then uses these predictions as recommendations, some of these "recommendations" will likely be racist.[265] Thus, machine learning is not well suited to help make decisions in areas where there is hope that the future will be better than the past. It is descriptive rather than prescriptive.[m]

Bias and unfairness may go undetected because the developers are overwhelmingly white and male: among AI engineers, about 4% are black and 20% are women.[258]

There are various conflicting definitions and mathematical models of fairness. These notions depend on ethical assumptions, and are influenced by beliefs about society. One broad category is distributive fairness, which focuses on the outcomes, often identifying groups and seeking to compensate for statistical disparities. Representational fairness tries to ensure that AI systems do not reinforce negative stereotypes or render certain groups invisible. Procedural fairness focuses on the decision process rather than the outcome. The most relevant notions of fairness may depend on the context, notably the type of AI application and the stakeholders. The subjectivity in the notions of bias and fairness makes it difficult for companies to operationalize them. Having access to sensitive attributes such as race or gender is also considered by many AI ethicists to be necessary in order to compensate for biases, but it may conflict with anti-discrimination laws.[252]

At its 2022 Conference on Fairness, Accountability, and Transparency (ACM FAccT 2022), the Association for Computing Machinery, in Seoul, South Korea, presented and published findings that recommend that until AI and robotics systems are demonstrated to be free of bias mistakes, they are unsafe, and the use of self-learning neural networks trained on vast, unregulated sources of flawed internet data should be curtailed.[dubiousdiscuss][267]

Lack of transparency

Many AI systems are so complex that their designers cannot explain how they reach their decisions.[268] Particularly with deep neural networks, in which there are many non-linear relationships between inputs and outputs. But some popular explainability techniques exist.[269]

It is impossible to be certain that a program is operating correctly if no one knows how exactly it works. There have been many cases where a machine learning program passed rigorous tests, but nevertheless learned something different than what the programmers intended. For example, a system that could identify skin diseases better than medical professionals was found to actually have a strong tendency to classify images with a ruler as "cancerous", because pictures of malignancies typically include a ruler to show the scale.[270] Another machine learning system designed to help effectively allocate medical resources was found to classify patients with asthma as being at "low risk" of dying from pneumonia. Having asthma is actually a severe risk factor, but since the patients having asthma would usually get much more medical care, they were relatively unlikely to die according to the training data. The correlation between asthma and low risk of dying from pneumonia was real, but misleading.[271]

People who have been harmed by an algorithm's decision have a right to an explanation.[272] Doctors, for example, are expected to clearly and completely explain to their colleagues the reasoning behind any decision they make. Early drafts of the European Union's General Data Protection Regulation in 2016 included an explicit statement that this right exists.[n] Industry experts noted that this is an unsolved problem with no solution in sight. Regulators argued that nevertheless the harm is real: if the problem has no solution, the tools should not be used.[273]

DARPA established the XAI ("Explainable Artificial Intelligence") program in 2014 to try to solve these problems.[274]

Several approaches aim to address the transparency problem. SHAP enables to visualise the contribution of each feature to the output.[275] LIME can locally approximate a model's outputs with a simpler, interpretable model.[276] Multitask learning provides a large number of outputs in addition to the target classification. These other outputs can help developers deduce what the network has learned.[277] Deconvolution, DeepDream and other generative methods can allow developers to see what different layers of a deep network for computer vision have learned, and produce output that can suggest what the network is learning.[278] For generative pre-trained transformers, Anthropic developed a technique based on dictionary learning that associates patterns of neuron activations with human-understandable concepts.[279]

Bad actors and weaponized AI

Artificial intelligence provides a number of tools that are useful to bad actors, such as authoritarian governments, terrorists, criminals or rogue states.

A lethal autonomous weapon is a machine that locates, selects and engages human targets without human supervision.[o] Widely available AI tools can be used by bad actors to develop inexpensive autonomous weapons and, if produced at scale, they are potentially weapons of mass destruction.[281] Even when used in conventional warfare, they currently cannot reliably choose targets and could potentially kill an innocent person.[281] In 2014, 30 nations (including China) supported a ban on autonomous weapons under the United Nations' Convention on Certain Conventional Weapons, however the United States and others disagreed.[282] By 2015, over fifty countries were reported to be researching battlefield robots.[283]

AI tools make it easier for authoritarian governments to efficiently control their citizens in several ways. Face and voice recognition allow widespread surveillance. Machine learning, operating this data, can classify potential enemies of the state and prevent them from hiding. Recommendation systems can precisely target propaganda and misinformation for maximum effect. Deepfakes and generative AI aid in producing misinformation. Advanced AI can make authoritarian centralized decision-making more competitive than liberal and decentralized systems such as markets. It lowers the cost and difficulty of digital warfare and advanced spyware.[284] All these technologies have been available since 2020 or earlier—AI facial recognition systems are already being used for mass surveillance in China.[285][286]

There are many other ways in which AI is expected to help bad actors, some of which can not be foreseen. For example, machine-learning AI is able to design tens of thousands of toxic molecules in a matter of hours.[287]

Technological unemployment

Economists have frequently highlighted the risks of redundancies from AI, and speculated about unemployment if there is no adequate social policy for full employment.[288]

In the past, technology has tended to increase rather than reduce total employment, but economists acknowledge that "we're in uncharted territory" with AI.[289] A survey of economists showed disagreement about whether the increasing use of robots and AI will cause a substantial increase in long-term unemployment, but they generally agree that it could be a net benefit if productivity gains are redistributed.[290] Risk estimates vary; for example, in the 2010s, Michael Osborne and Carl Benedikt Frey estimated 47% of U.S. jobs are at "high risk" of potential automation, while an OECD report classified only 9% of U.S. jobs as "high risk".[p][292] The methodology of speculating about future employment levels has been criticised as lacking evidential foundation, and for implying that technology, rather than social policy, creates unemployment, as opposed to redundancies.[288] In April 2023, it was reported that 70% of the jobs for Chinese video game illustrators had been eliminated by generative artificial intelligence.[293][294]

Unlike previous waves of automation, many middle-class jobs may be eliminated by artificial intelligence; The Economist stated in 2015 that "the worry that AI could do to white-collar jobs what steam power did to blue-collar ones during the Industrial Revolution" is "worth taking seriously".[295] Jobs at extreme risk range from paralegals to fast food cooks, while job demand is likely to increase for care-related professions ranging from personal healthcare to the clergy.[296]

From the early days of the development of artificial intelligence, there have been arguments, for example, those put forward by Joseph Weizenbaum, about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculation and qualitative, value-based judgement.[297]

Existential risk

It has been argued AI will become so powerful that humanity may irreversibly lose control of it. This could, as physicist Stephen Hawking stated, "spell the end of the human race".[298] This scenario has been common in science fiction, when a computer or robot suddenly develops a human-like "self-awareness" (or "sentience" or "consciousness") and becomes a malevolent character.[q] These sci-fi scenarios are misleading in several ways.

First, AI does not require human-like sentience to be an existential risk. Modern AI programs are given specific goals and use learning and intelligence to achieve them. Philosopher Nick Bostrom argued that if one gives almost any goal to a sufficiently powerful AI, it may choose to destroy humanity to achieve it (he used the example of a paperclip maximizer).[300] Stuart Russell gives the example of household robot that tries to find a way to kill its owner to prevent it from being unplugged, reasoning that "you can't fetch the coffee if you're dead."[301] In order to be safe for humanity, a superintelligence would have to be genuinely aligned with humanity's morality and values so that it is "fundamentally on our side".[302]

Second, Yuval Noah Harari argues that AI does not require a robot body or physical control to pose an existential risk. The essential parts of civilization are not physical. Things like ideologies, law, government, money and the economy are built on language; they exist because there are stories that billions of people believe. The current prevalence of misinformation suggests that an AI could use language to convince people to believe anything, even to take actions that are destructive.[303]

The opinions amongst experts and industry insiders are mixed, with sizable fractions both concerned and unconcerned by risk from eventual superintelligent AI.[304] Personalities such as Stephen Hawking, Bill Gates, and Elon Musk,[305] as well as AI pioneers such as Yoshua Bengio, Stuart Russell, Demis Hassabis, and Sam Altman, have expressed concerns about existential risk from AI.

In May 2023, Geoffrey Hinton announced his resignation from Google in order to be able to "freely speak out about the risks of AI" without "considering how this impacts Google".[306] He notably mentioned risks of an AI takeover,[307] and stressed that in order to avoid the worst outcomes, establishing safety guidelines will require cooperation among those competing in use of AI.[308]

In 2023, many leading AI experts endorsed the joint statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".[309]

Some other researchers were more optimistic. AI pioneer Jürgen Schmidhuber did not sign the joint statement, emphasising that in 95% of all cases, AI research is about making "human lives longer and healthier and easier."[310] While the tools that are now being used to improve lives can also be used by bad actors, "they can also be used against the bad actors."[311][312] Andrew Ng also argued that "it's a mistake to fall for the doomsday hype on AI—and that regulators who do will only benefit vested interests."[313] Yann LeCun "scoffs at his peers' dystopian scenarios of supercharged misinformation and even, eventually, human extinction."[314] In the early 2010s, experts argued that the risks are too distant in the future to warrant research or that humans will be valuable from the perspective of a superintelligent machine.[315] However, after 2016, the study of current and future risks and possible solutions became a serious area of research.[316]

Ethical machines and alignment

Friendly AI are machines that have been designed from the beginning to minimize risks and to make choices that benefit humans. Eliezer Yudkowsky, who coined the term, argues that developing friendly AI should be a higher research priority: it may require a large investment and it must be completed before AI becomes an existential risk.[317]

Machines with intelligence have the potential to use their intelligence to make ethical decisions. The field of machine ethics provides machines with ethical principles and procedures for resolving ethical dilemmas.[318] The field of machine ethics is also called computational morality,[318] and was founded at an AAAI symposium in 2005.[319]

Other approaches include Wendell Wallach's "artificial moral agents"[320] and Stuart J. Russell's three principles for developing provably beneficial machines.[321]

Open source

Active organizations in the AI open-source community include Hugging Face,[322] Google,[323] EleutherAI and Meta.[324] Various AI models, such as Llama 2, Mistral or Stable Diffusion, have been made open-weight,[325][326] meaning that their architecture and trained parameters (the "weights") are publicly available. Open-weight models can be freely fine-tuned, which allows companies to specialize them with their own data and for their own use-case.[327] Open-weight models are useful for research and innovation but can also be misused. Since they can be fine-tuned, any built-in security measure, such as objecting to harmful requests, can be trained away until it becomes ineffective. Some researchers warn that future AI models may develop dangerous capabilities (such as the potential to drastically facilitate bioterrorism) and that once released on the Internet, they cannot be deleted everywhere if needed. They recommend pre-release audits and cost-benefit analyses.[328]

Frameworks

Artificial intelligence projects can be guided by ethical considerations during the design, development, and implementation of an AI system. An AI framework such as the Care and Act Framework, developed by the Alan Turing Institute and based on the SUM values, outlines four main ethical dimensions, defined as follows:[329][330]

  • Respect the dignity of individual people
  • Connect with other people sincerely, openly, and inclusively
  • Care for the wellbeing of everyone
  • Protect social values, justice, and the public interest

Other developments in ethical frameworks include those decided upon during the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, among others;[331] however, these principles are not without criticism, especially regarding the people chosen to contribute to these frameworks.[332]

Promotion of the wellbeing of the people and communities that these technologies affect requires consideration of the social and ethical implications at all stages of AI system design, development and implementation, and collaboration between job roles such as data scientists, product managers, data engineers, domain experts, and delivery managers.[333]

The UK AI Safety Institute released in 2024 a testing toolset called 'Inspect' for AI safety evaluations available under an MIT open-source licence which is freely available on GitHub and can be improved with third-party packages. It can be used to evaluate AI models in a range of areas including core knowledge, ability to reason, and autonomous capabilities.[334]

Regulation

AI Safety Summit
The first global AI Safety Summit was held in the United Kingdom in November 2023 with a declaration calling for international cooperation.

The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating AI; it is therefore related to the broader regulation of algorithms.[335] The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally.[336] According to AI Index at Stanford, the annual number of AI-related laws passed in the 127 survey countries jumped from one passed in 2016 to 37 passed in 2022 alone.[337][338] Between 2016 and 2020, more than 30 countries adopted dedicated strategies for AI.[339] Most EU member states had released national AI strategies, as had Canada, China, India, Japan, Mauritius, the Russian Federation, Saudi Arabia, United Arab Emirates, U.S., and Vietnam. Others were in the process of elaborating their own AI strategy, including Bangladesh, Malaysia and Tunisia.[339] The Global Partnership on Artificial Intelligence was launched in June 2020, stating a need for AI to be developed in accordance with human rights and democratic values, to ensure public confidence and trust in the technology.[339] Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher published a joint statement in November 2021 calling for a government commission to regulate AI.[340] In 2023, OpenAI leaders published recommendations for the governance of superintelligence, which they believe may happen in less than 10 years.[341] In 2023, the United Nations also launched an advisory body to provide recommendations on AI governance; the body comprises technology company executives, government officials and academics.[342] In 2024, the Council of Europe created the first international legally binding treaty on AI, called the "Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law". It was adopted by the European Union, the United States, the United Kingdom, and other signatories.[343]

In a 2022 Ipsos survey, attitudes towards AI varied greatly by country; 78% of Chinese citizens, but only 35% of Americans, agreed that "products and services using AI have more benefits than drawbacks".[337] A 2023 Reuters/Ipsos poll found that 61% of Americans agree, and 22% disagree, that AI poses risks to humanity.[344] In a 2023 Fox News poll, 35% of Americans thought it "very important", and an additional 41% thought it "somewhat important", for the federal government to regulate AI, versus 13% responding "not very important" and 8% responding "not at all important".[345][346]

In November 2023, the first global AI Safety Summit was held in Bletchley Park in the UK to discuss the near and far term risks of AI and the possibility of mandatory and voluntary regulatory frameworks.[347] 28 countries including the United States, China, and the European Union issued a declaration at the start of the summit, calling for international co-operation to manage the challenges and risks of artificial intelligence.[348][349] In May 2024 at the AI Seoul Summit, 16 global AI tech companies agreed to safety commitments on the development of AI.[350][351]

History

In 2024, AI patents in China and the US numbered more than three-fourths of AI patents worldwide.[352] Though China had more AI patents, the US had 35% more patents per AI patent-applicant company than China.[352]

The study of mechanical or "formal" reasoning began with philosophers and mathematicians in antiquity. The study of logic led directly to Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as "0" and "1", could simulate any conceivable form of mathematical reasoning.[353][354] This, along with concurrent discoveries in cybernetics, information theory and neurobiology, led researchers to consider the possibility of building an "electronic brain".[r] They developed several areas of research that would become part of AI,[356] such as McCulloch and Pitts design for "artificial neurons" in 1943,[117] and Turing's influential 1950 paper 'Computing Machinery and Intelligence', which introduced the Turing test and showed that "machine intelligence" was plausible.[357][354]

The field of AI research was founded at a workshop at Dartmouth College in 1956.[s][6] The attendees became the leaders of AI research in the 1960s.[t] They and their students produced programs that the press described as "astonishing":[u] computers were learning checkers strategies, solving word problems in algebra, proving logical theorems and speaking English.[v][7] Artificial intelligence laboratories were set up at a number of British and U.S. universities in the latter 1950s and early 1960s.[354]

Researchers in the 1960s and the 1970s were convinced that their methods would eventually succeed in creating a machine with general intelligence and considered this the goal of their field.[361] In 1965 Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".[362] In 1967 Marvin Minsky agreed, writing that "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved".[363] They had, however, underestimated the difficulty of the problem.[w] In 1974, both the U.S. and British governments cut off exploratory research in response to the criticism of Sir James Lighthill[365] and ongoing pressure from the U.S. Congress to fund more productive projects.[366] Minsky and Papert's book Perceptrons was understood as proving that artificial neural networks would never be useful for solving real-world tasks, thus discrediting the approach altogether.[367] The "AI winter", a period when obtaining funding for AI projects was difficult, followed.[9]

In the early 1980s, AI research was revived by the commercial success of expert systems,[368] a form of AI program that simulated the knowledge and analytical skills of human experts. By 1985, the market for AI had reached over a billion dollars. At the same time, Japan's fifth generation computer project inspired the U.S. and British governments to restore funding for academic research.[8] However, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.[10]

Up to this point, most of AI's funding had gone to projects that used high-level symbols to represent mental objects like plans, goals, beliefs, and known facts. In the 1980s, some researchers began to doubt that this approach would be able to imitate all the processes of human cognition, especially perception, robotics, learning and pattern recognition,[369] and began to look into "sub-symbolic" approaches.[370] Rodney Brooks rejected "representation" in general and focussed directly on engineering machines that move and survive.[x] Judea Pearl, Lotfi Zadeh, and others developed methods that handled incomplete and uncertain information by making reasonable guesses rather than precise logic.[87][375] But the most important development was the revival of "connectionism", including neural network research, by Geoffrey Hinton and others.[376] In 1990, Yann LeCun successfully showed that convolutional neural networks can recognize handwritten digits, the first of many successful applications of neural networks.[377]

AI gradually restored its reputation in the late 1990s and early 21st century by exploiting formal mathematical methods and by finding specific solutions to specific problems. This "narrow" and "formal" focus allowed researchers to produce verifiable results and collaborate with other fields (such as statistics, economics and mathematics).[378] By 2000, solutions developed by AI researchers were being widely used, although in the 1990s they were rarely described as "artificial intelligence" (a tendency known as the AI effect).[379] However, several academic researchers became concerned that AI was no longer pursuing its original goal of creating versatile, fully intelligent machines. Beginning around 2002, they founded the subfield of artificial general intelligence (or "AGI"), which had several well-funded institutions by the 2010s.[68]

Deep learning began to dominate industry benchmarks in 2012 and was adopted throughout the field.[11] For many specific tasks, other methods were abandoned.[y] Deep learning's success was based on both hardware improvements (faster computers,[381] graphics processing units, cloud computing[382]) and access to large amounts of data[383] (including curated datasets,[382] such as ImageNet). Deep learning's success led to an enormous increase in interest and funding in AI.[z] The amount of machine learning research (measured by total publications) increased by 50% in the years 2015–2019.[339]

The number of Google searches for the term "AI" accelerated in 2022.

In 2016, issues of fairness and the misuse of technology were catapulted into center stage at machine learning conferences, publications vastly increased, funding became available, and many researchers re-focussed their careers on these issues. The alignment problem became a serious field of academic study.[316]

In the late 2010s and early 2020s, AGI companies began to deliver programs that created enormous interest. In 2015, AlphaGo, developed by DeepMind, beat the world champion Go player. The program taught only the game's rules and developed a strategy by itself. GPT-3 is a large language model that was released in 2020 by OpenAI and is capable of generating high-quality human-like text.[384] ChatGPT, launched on November 30, 2022, became the fastest-growing consumer software application in history, gaining over 100 million users in two months.[385] It marked what is widely regarded as AI's breakout year, bringing it into the public consciousness.[386] These programs, and others, inspired an aggressive AI boom, where large companies began investing billions of dollars in AI research. According to AI Impacts, about US$50 billion annually was invested in "AI" around 2022 in the U.S. alone and about 20% of the new U.S. Computer Science PhD graduates have specialized in "AI".[387] About 800,000 "AI"-related U.S. job openings existed in 2022.[388] According to PitchBook research, 22% of newly funded startups in 2024 claimed to be AI companies.[389]

Philosophy

Philosophical debates have historically sought to determine the nature of intelligence and how to make intelligent machines.[390] Another major focus has been whether machines can be conscious, and the associated ethical implications.[391] Many other topics in philosophy are relevant to AI, such as epistemology and free will.[392] Rapid advancements have intensified public discussions on the philosophy and ethics of AI.[391]

Defining artificial intelligence

Alan Turing wrote in 1950 "I propose to consider the question 'can machines think'?"[393] He advised changing the question from whether a machine "thinks", to "whether or not it is possible for machinery to show intelligent behaviour".[393] He devised the Turing test, which measures the ability of a machine to simulate human conversation.[357] Since we can only observe the behavior of the machine, it does not matter if it is "actually" thinking or literally has a "mind". Turing notes that we can not determine these things about other people but "it is usual to have a polite convention that everyone thinks."[394]

The Turing test can provide some evidence of intelligence, but it penalizes non-human intelligent behavior.[395]

Russell and Norvig agree with Turing that intelligence must be defined in terms of external behavior, not internal structure.[1] However, they are critical that the test requires the machine to imitate humans. "Aeronautical engineering texts", they wrote, "do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool other pigeons.'"[396] AI founder John McCarthy agreed, writing that "Artificial intelligence is not, by definition, simulation of human intelligence".[397]

McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world".[398] Another AI founder, Marvin Minsky, similarly describes it as "the ability to solve hard problems".[399] The leading AI textbook defines it as the study of agents that perceive their environment and take actions that maximize their chances of achieving defined goals.[1] These definitions view intelligence in terms of well-defined problems with well-defined solutions, where both the difficulty of the problem and the performance of the program are direct measures of the "intelligence" of the machine—and no other philosophical discussion is required, or may not even be possible.

Another definition has been adopted by Google,[400] a major practitioner in the field of AI. This definition stipulates the ability of systems to synthesize information as the manifestation of intelligence, similar to the way it is defined in biological intelligence.

Some authors have suggested in practice, that the definition of AI is vague and difficult to define, with contention as to whether classical algorithms should be categorised as AI,[401] with many companies during the early 2020s AI boom using the term as a marketing buzzword, often even if they did "not actually use AI in a material way".[402]

There has been debate over whether large language models exhibit genuine intelligence or merely simulate it by imitating human text.[403]

Evaluating approaches to AI

No established unifying theory or paradigm has guided AI research for most of its history.[aa] The unprecedented success of statistical machine learning in the 2010s eclipsed all other approaches (so much so that some sources, especially in the business world, use the term "artificial intelligence" to mean "machine learning with neural networks"). This approach is mostly sub-symbolic, soft and narrow. Critics argue that these questions may have to be revisited by future generations of AI researchers.

Symbolic AI and its limits

Symbolic AI (or "GOFAI")[405] simulated the high-level conscious reasoning that people use when they solve puzzles, express legal reasoning and do mathematics. They were highly successful at "intelligent" tasks such as algebra or IQ tests. In the 1960s, Newell and Simon proposed the physical symbol systems hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."[406]

However, the symbolic approach failed on many tasks that humans solve easily, such as learning, recognizing an object or commonsense reasoning. Moravec's paradox is the discovery that high-level "intelligent" tasks were easy for AI, but low level "instinctive" tasks were extremely difficult.[407] Philosopher Hubert Dreyfus had argued since the 1960s that human expertise depends on unconscious instinct rather than conscious symbol manipulation, and on having a "feel" for the situation, rather than explicit symbolic knowledge.[408] Although his arguments had been ridiculed and ignored when they were first presented, eventually, AI research came to agree with him.[ab][16]

The issue is not resolved: sub-symbolic reasoning can make many of the same inscrutable mistakes that human intuition does, such as algorithmic bias. Critics such as Noam Chomsky argue continuing research into symbolic AI will still be necessary to attain general intelligence,[410][411] in part because sub-symbolic AI is a move away from explainable AI: it can be difficult or impossible to understand why a modern statistical AI program made a particular decision. The emerging field of neuro-symbolic artificial intelligence attempts to bridge the two approaches.

Neat vs. scruffy

"Neats" hope that intelligent behavior is described using simple, elegant principles (such as logic, optimization, or neural networks). "Scruffies" expect that it necessarily requires solving a large number of unrelated problems. Neats defend their programs with theoretical rigor, scruffies rely mainly on incremental testing to see if they work. This issue was actively discussed in the 1970s and 1980s,[412] but eventually was seen as irrelevant. Modern AI has elements of both.

Soft vs. hard computing

Finding a provably correct or optimal solution is intractable for many important problems.[15] Soft computing is a set of techniques, including genetic algorithms, fuzzy logic and neural networks, that are tolerant of imprecision, uncertainty, partial truth and approximation. Soft computing was introduced in the late 1980s and most successful AI programs in the 21st century are examples of soft computing with neural networks.

Narrow vs. general AI

AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence directly or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals.[413][414] General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focusing on specific problems with specific solutions. The sub-field of artificial general intelligence studies this area exclusively.

Machine consciousness, sentience, and mind

There is no settled consensus in philosophy of mind on whether a machine can have a mind, consciousness and mental states in the same sense that human beings do. This issue considers the internal experiences of the machine, rather than its external behavior. Mainstream AI research considers this issue irrelevant because it does not affect the goals of the field: to build machines that can solve problems using intelligence. Russell and Norvig add that "[t]he additional project of making a machine conscious in exactly the way humans are is not one that we are equipped to take on."[415] However, the question has become central to the philosophy of mind. It is also typically the central question at issue in artificial intelligence in fiction.

Consciousness

David Chalmers identified two problems in understanding the mind, which he named the "hard" and "easy" problems of consciousness.[416] The easy problem is understanding how the brain processes signals, makes plans and controls behavior. The hard problem is explaining how this feels or why it should feel like anything at all, assuming we are right in thinking that it truly does feel like something (Dennett's consciousness illusionism says this is an illusion). While human information processing is easy to explain, human subjective experience is difficult to explain. For example, it is easy to imagine a color-blind person who has learned to identify which objects in their field of view are red, but it is not clear what would be required for the person to know what red looks like.[417]

Computationalism and functionalism

Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing. Computationalism argues that the relationship between mind and body is similar or identical to the relationship between software and hardware and thus may be a solution to the mind–body problem. This philosophical position was inspired by the work of AI researchers and cognitive scientists in the 1960s and was originally proposed by philosophers Jerry Fodor and Hilary Putnam.[418]

Philosopher John Searle characterized this position as "strong AI": "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."[ac] Searle challenges this claim with his Chinese room argument, which attempts to show that even a computer capable of perfectly simulating human behavior would not have a mind.[422]

AI welfare and rights

It is difficult or impossible to reliably evaluate whether an advanced AI is sentient (has the ability to feel), and if so, to what degree.[423] But if there is a significant chance that a given machine can feel and suffer, then it may be entitled to certain rights or welfare protection measures, similarly to animals.[424][425] Sapience (a set of capacities related to high intelligence, such as discernment or self-awareness) may provide another moral basis for AI rights.[424] Robot rights are also sometimes proposed as a practical way to integrate autonomous agents into society.[426]

In 2017, the European Union considered granting "electronic personhood" to some of the most capable AI systems. Similarly to the legal status of companies, it would have conferred rights but also responsibilities.[427] Critics argued in 2018 that granting rights to AI systems would downplay the importance of human rights, and that legislation should focus on user needs rather than speculative futuristic scenarios. They also noted that robots lacked the autonomy to take part in society on their own.[428][429]

Progress in AI increased interest in the topic. Proponents of AI welfare and rights often argue that AI sentience, if it emerges, would be particularly easy to deny. They warn that this may be a moral blind spot analogous to slavery or factory farming, which could lead to large-scale suffering if sentient AI is created and carelessly exploited.[425][424]

Future

Superintelligence and the singularity

A superintelligence is a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind.[414] If research into artificial general intelligence produced sufficiently intelligent software, it might be able to reprogram and improve itself. The improved software would be even better at improving itself, leading to what I. J. Good called an "intelligence explosion" and Vernor Vinge called a "singularity".[430]

However, technologies cannot improve exponentially indefinitely, and typically follow an S-shaped curve, slowing when they reach the physical limits of what the technology can do.[431]

Transhumanism

Robot designer Hans Moravec, cyberneticist Kevin Warwick and inventor Ray Kurzweil have predicted that humans and machines may merge in the future into cyborgs that are more capable and powerful than either. This idea, called transhumanism, has roots in the writings of Aldous Huxley and Robert Ettinger.[432]

Edward Fredkin argues that "artificial intelligence is the next step in evolution", an idea first proposed by Samuel Butler's "Darwin among the Machines" as far back as 1863, and expanded upon by George Dyson in his 1998 book Darwin Among the Machines: The Evolution of Global Intelligence.[433]

In fiction

The word "robot" itself was coined by Karel Čapek in his 1921 play R.U.R., the title standing for "Rossum's Universal Robots".

Thought-capable artificial beings have appeared as storytelling devices since antiquity,[434] and have been a persistent theme in science fiction.[435]

A common trope in these works began with Mary Shelley's Frankenstein, where a human creation becomes a threat to its masters. This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: A Space Odyssey (both 1968), with HAL 9000, the murderous computer in charge of the Discovery One spaceship, as well as The Terminator (1984) and The Matrix (1999). In contrast, the rare loyal robots such as Gort from The Day the Earth Stood Still (1951) and Bishop from Aliens (1986) are less prominent in popular culture.[436]

Isaac Asimov introduced the Three Laws of Robotics in many stories, most notably with the "Multivac" super-intelligent computer. Asimov's laws are often brought up during lay discussions of machine ethics;[437] while almost all artificial intelligence researchers are familiar with Asimov's laws through popular culture, they generally consider the laws useless for many reasons, one of which is their ambiguity.[438]

Several works use AI to force us to confront the fundamental question of what makes us human, showing us artificial beings that have the ability to feel, and thus to suffer. This appears in Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, as well as the novel Do Androids Dream of Electric Sheep?, by Philip K. Dick. Dick considers the idea that our understanding of human subjectivity is altered by technology created with artificial intelligence.[439]

See also

Explanatory notes

  1. ^ a b This list of intelligent traits is based on the topics covered by the major AI textbooks, including: Russell & Norvig (2021), Luger & Stubblefield (2004), Poole, Mackworth & Goebel (1998) and Nilsson (1998)
  2. ^ a b This list of tools is based on the topics covered by the major AI textbooks, including: Russell & Norvig (2021), Luger & Stubblefield (2004), Poole, Mackworth & Goebel (1998) and Nilsson (1998)
  3. ^ It is among the reasons that expert systems proved to be inefficient for capturing knowledge.[30][31]
  4. ^ "Rational agent" is general term used in economics, philosophy and theoretical artificial intelligence. It can refer to anything that directs its behavior to accomplish goals, such as a person, an animal, a corporation, a nation, or in the case of AI, a computer program.
  5. ^ Alan Turing discussed the centrality of learning as early as 1950, in his classic paper "Computing Machinery and Intelligence".[42] In 1956, at the original Dartmouth AI summer conference, Ray Solomonoff wrote a report on unsupervised probabilistic machine learning: "An Inductive Inference Machine".[43]
  6. ^ See AI winter § Machine translation and the ALPAC report of 1966
  7. ^ Compared with symbolic logic, formal Bayesian inference is computationally expensive. For inference to be tractable, most observations must be conditionally independent of one another. AdSense uses a Bayesian network with over 300 million edges to learn which ads to serve.[94]
  8. ^ Expectation–maximization, one of the most popular algorithms in machine learning, allows clustering in the presence of unknown latent variables.[96]
  9. ^ Some form of deep neural networks (without a specific learning algorithm) were described by: Warren S. McCulloch and Walter Pitts (1943)[117] Alan Turing (1948);[118] Karl Steinbuch and Roger David Joseph (1961).[119] Deep or recurrent networks that learned (or used gradient descent) were developed by: Frank Rosenblatt(1957);[118] Oliver Selfridge (1959);[119] Alexey Ivakhnenko and Valentin Lapa (1965);[120] Kaoru Nakano (1971);[121] Shun-Ichi Amari (1972);[121] John Joseph Hopfield (1982).[121] Precursors to backpropagation were developed by: Henry J. Kelley (1960);[118] Arthur E. Bryson (1962);[118] Stuart Dreyfus (1962);[118] Arthur E. Bryson and Yu-Chi Ho (1969);[118] Backpropagation was independently developed by: Seppo Linnainmaa (1970);[122] Paul Werbos (1974).[118]
  10. ^ Geoffrey Hinton said, of his work on neural networks in the 1990s, "our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow."[123]
  11. ^ In statistics, a bias is a systematic error or deviation from the correct value. But in the context of fairness, it refers to a tendency in favor or against a certain group or individual characteristic, usually in a way that is considered unfair or harmful. A statistically unbiased AI system that produces disparate outcomes for different demographic groups may thus be viewed as biased in the ethical sense.[252]
  12. ^ Including Jon Kleinberg (Cornell University), Sendhil Mullainathan (University of Chicago), Cynthia Chouldechova (Carnegie Mellon) and Sam Corbett-Davis (Stanford)[261]
  13. ^ Moritz Hardt (a director at the Max Planck Institute for Intelligent Systems) argues that machine learning "is fundamentally the wrong tool for a lot of domains, where you're trying to design interventions and mechanisms that change the world."[266]
  14. ^ When the law was passed in 2018, it still contained a form of this provision.
  15. ^ This is the United Nations' definition, and includes things like land mines as well.[280]
  16. ^ See table 4; 9% is both the OECD average and the U.S. average.[291]
  17. ^ Sometimes called a "robopocalypse"[299]
  18. ^ "Electronic brain" was the term used by the press around this time.[353][355]
  19. ^ Daniel Crevier wrote, "the conference is generally recognized as the official birthdate of the new science."[358] Russell and Norvig called the conference "the inception of artificial intelligence."[117]
  20. ^ Russell and Norvig wrote "for the next 20 years the field would be dominated by these people and their students."[359]
  21. ^ Russell and Norvig wrote, "it was astonishing whenever a computer did anything kind of smartish".[360]
  22. ^ The programs described are Arthur Samuel's checkers program for the IBM 701, Daniel Bobrow's STUDENT, Newell and Simon's Logic Theorist and Terry Winograd's SHRDLU.
  23. ^ Russell and Norvig write: "in almost all cases, these early systems failed on more difficult problems"[364]
  24. ^ Embodied approaches to AI[371] were championed by Hans Moravec[372] and Rodney Brooks[373] and went by many names: Nouvelle AI.[373] Developmental robotics.[374]
  25. ^ Matteo Wong wrote in The Atlantic: "Whereas for decades, computer-science fields such as natural-language processing, computer vision, and robotics used extremely different methods, now they all use a programming method called "deep learning". As a result, their code and approaches have become more similar, and their models are easier to integrate into one another."[380]
  26. ^ Jack Clark wrote in Bloomberg: "After a half-decade of quiet breakthroughs in artificial intelligence, 2015 has been a landmark year. Computers are smarter and learning faster than ever", and noted that the number of software projects that use machine learning at Google increased from a "sporadic usage" in 2012 to more than 2,700 projects in 2015.[382]
  27. ^ Nils Nilsson wrote in 1983: "Simply put, there is wide disagreement in the field about what AI is all about."[404]
  28. ^ Daniel Crevier wrote that "time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier."[409]
  29. ^ Searle presented this definition of "Strong AI" in 1999.[419] Searle's original formulation was "The appropriately programmed computer really is a mind, in the sense that computers given the right programs can be literally said to understand and have other cognitive states."[420] Strong AI is defined similarly by Russell and Norvig: "Stong AI – the assertion that machines that do so are actually thinking (as opposed to simulating thinking)."[421]

References

  1. ^ a b c Russell & Norvig (2021), pp. 1–4.
  2. ^ AI set to exceed human brain power Archived 2008-02-19 at the Wayback Machine CNN.com (July 26, 2006)
  3. ^ Kaplan, Andreas; Haenlein, Michael (2019). "Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence". Business Horizons. 62: 15–25. doi:10.1016/j.bushor.2018.08.004. ISSN 0007-6813. S2CID 158433736.
  4. ^ Russell & Norvig (2021, §1.2).
  5. ^ "Tech companies want to build artificial general intelligence. But who decides when AGI is attained?". AP News. 4 April 2024. Retrieved 20 May 2025.
  6. ^ a b Dartmouth workshop: Russell & Norvig (2021, p. 18), McCorduck (2004, pp. 111–136), NRC (1999, pp. 200–201)
    The proposal: McCarthy et al. (1955)
  7. ^ a b Successful programs of the 1960s: McCorduck (2004, pp. 243–252), Crevier (1993, pp. 52–107), Moravec (1988, p. 9), Russell & Norvig (2021, pp. 19–21)
  8. ^ a b Funding initiatives in the early 1980s: Fifth Generation Project (Japan), Alvey (UK), Microelectronics and Computer Technology Corporation (US), Strategic Computing Initiative (US): McCorduck (2004, pp. 426–441), Crevier (1993, pp. 161–162, 197–203, 211, 240), Russell & Norvig (2021, p. 23), NRC (1999, pp. 210–211), Newquist (1994, pp. 235–248)
  9. ^ a b First AI Winter, Lighthill report, Mansfield Amendment: Crevier (1993, pp. 115–117), Russell & Norvig (2021, pp. 21–22), NRC (1999, pp. 212–213), Howe (1994), Newquist (1994, pp. 189–201)
  10. ^ a b Second AI Winter: Russell & Norvig (2021, p. 24), McCorduck (2004, pp. 430–435), Crevier (1993, pp. 209–210), NRC (1999, pp. 214–216), Newquist (1994, pp. 301–318)
  11. ^ a b Deep learning revolution, AlexNet: Goldman (2022), Russell & Norvig (2021, p. 26), McKinsey (2018)
  12. ^ Toews (2023).
  13. ^ Problem-solving, puzzle solving, game playing, and deduction: Russell & Norvig (2021, chpt. 3–5), Russell & Norvig (2021, chpt. 6) (constraint satisfaction), Poole, Mackworth & Goebel (1998, chpt. 2, 3, 7, 9), Luger & Stubblefield (2004, chpt. 3, 4, 6, 8), Nilsson (1998, chpt. 7–12)
  14. ^ Uncertain reasoning: Russell & Norvig (2021, chpt. 12–18), Poole, Mackworth & Goebel (1998, pp. 345–395), Luger & Stubblefield (2004, pp. 333–381), Nilsson (1998, chpt. 7–12)
  15. ^ a b c Intractability and efficiency and the combinatorial explosion: Russell & Norvig (2021, p. 21)
  16. ^ a b c Psychological evidence of the prevalence of sub-symbolic reasoning and knowledge: Kahneman (2011), Dreyfus & Dreyfus (1986), Wason & Shapiro (1966), Kahneman, Slovic & Tversky (1982)
  17. ^ Knowledge representation and knowledge engineering: Russell & Norvig (2021, chpt. 10), Poole, Mackworth & Goebel (1998, pp. 23–46, 69–81, 169–233, 235–277, 281–298, 319–345), Luger & Stubblefield (2004, pp. 227–243), Nilsson (1998, chpt. 17.1–17.4, 18)
  18. ^ Smoliar & Zhang (1994).
  19. ^ Neumann & Möller (2008).
  20. ^ Kuperman, Reichley & Bailey (2006).
  21. ^ McGarry (2005).
  22. ^ Bertini, Del Bimbo & Torniai (2006).
  23. ^ Russell & Norvig (2021), pp. 272.
  24. ^ Representing categories and relations: Semantic networks, description logics, inheritance (including frames, and scripts): Russell & Norvig (2021, §10.2 & 10.5), Poole, Mackworth & Goebel (1998, pp. 174–177), Luger & Stubblefield (2004, pp. 248–258), Nilsson (1998, chpt. 18.3)
  25. ^ Representing events and time:Situation calculus, event calculus, fluent calculus (including solving the frame problem): Russell & Norvig (2021, §10.3), Poole, Mackworth & Goebel (1998, pp. 281–298), Nilsson (1998, chpt. 18.2)
  26. ^ Causal calculus: Poole, Mackworth & Goebel (1998, pp. 335–337)
  27. ^ Representing knowledge about knowledge: Belief calculus, modal logics: Russell & Norvig (2021, §10.4), Poole, Mackworth & Goebel (1998, pp. 275–277)
  28. ^ a b Default reasoning, Frame problem, default logic, non-monotonic logics, circumscription, closed world assumption, abduction: Russell & Norvig (2021, §10.6), Poole, Mackworth & Goebel (1998, pp. 248–256, 323–335), Luger & Stubblefield (2004, pp. 335–363), Nilsson (1998, ~18.3.3) (Poole et al. places abduction under "default reasoning". Luger et al. places this under "uncertain reasoning").
  29. ^ a b Breadth of commonsense knowledge: Lenat & Guha (1989, Introduction), Crevier (1993, pp. 113–114), Moravec (1988, p. 13), Russell & Norvig (2021, pp. 241, 385, 982) (qualification problem)
  30. ^ Newquist (1994), p. 296.
  31. ^ Crevier (1993), pp. 204–208.
  32. ^ Russell & Norvig (2021), p. 528.
  33. ^ Automated planning: Russell & Norvig (2021, chpt. 11).
  34. ^ Automated decision making, Decision theory: Russell & Norvig (2021, chpt. 16–18).
  35. ^ Classical planning: Russell & Norvig (2021, Section 11.2).
  36. ^ Sensorless or "conformant" planning, contingent planning, replanning (a.k.a. online planning): Russell & Norvig (2021, Section 11.5).
  37. ^ Uncertain preferences: Russell & Norvig (2021, Section 16.7) Inverse reinforcement learning: Russell & Norvig (2021, Section 22.6)
  38. ^ Information value theory: Russell & Norvig (2021, Section 16.6).
  39. ^ Markov decision process: Russell & Norvig (2021, chpt. 17).
  40. ^ Game theory and multi-agent decision theory: Russell & Norvig (2021, chpt. 18).
  41. ^ Learning: Russell & Norvig (2021, chpt. 19–22), Poole, Mackworth & Goebel (1998, pp. 397–438), Luger & Stubblefield (2004, pp. 385–542), Nilsson (1998, chpt. 3.3, 10.3, 17.5, 20)
  42. ^ Turing (1950).
  43. ^ Solomonoff (1956).
  44. ^ Unsupervised learning: Russell & Norvig (2021, pp. 653) (definition), Russell & Norvig (2021, pp. 738–740) (cluster analysis), Russell & Norvig (2021, pp. 846–860) (word embedding)
  45. ^ a b Supervised learning: Russell & Norvig (2021, §19.2) (Definition), Russell & Norvig (2021, Chpt. 19–20) (Techniques)
  46. ^ Reinforcement learning: Russell & Norvig (2021, chpt. 22), Luger & Stubblefield (2004, pp. 442–449)
  47. ^ Transfer learning: Russell & Norvig (2021, pp. 281), The Economist (2016)
  48. ^ "Artificial Intelligence (AI): What Is AI and How Does It Work? | Built In". builtin.com. Retrieved 30 October 2023.
  49. ^ Computational learning theory: Russell & Norvig (2021, pp. 672–674), Jordan & Mitchell (2015)
  50. ^ Natural language processing (NLP): Russell & Norvig (2021, chpt. 23–24), Poole, Mackworth & Goebel (1998, pp. 91–104), Luger & Stubblefield (2004, pp. 591–632)
  51. ^ Subproblems of NLP: Russell & Norvig (2021, pp. 849–850)
  52. ^ Russell & Norvig (2021), pp. 856–858.
  53. ^ Dickson (2022).
  54. ^ Modern statistical and deep learning approaches to NLP: Russell & Norvig (2021, chpt. 24), Cambria & White (2014)
  55. ^ Vincent (2019).
  56. ^ Russell & Norvig (2021), pp. 875–878.
  57. ^ Bushwick (2023).
  58. ^ Computer vision: Russell & Norvig (2021, chpt. 25), Nilsson (1998, chpt. 6)
  59. ^ Russell & Norvig (2021), pp. 849–850.
  60. ^ Russell & Norvig (2021), pp. 895–899.
  61. ^ Russell & Norvig (2021), pp. 899–901.
  62. ^ Challa et al. (2011).
  63. ^ Russell & Norvig (2021), pp. 931–938.
  64. ^ MIT AIL (2014).
  65. ^ Affective computing: Thro (1993), Edelson (1991), Tao & Tan (2005), Scassellati (2002)
  66. ^ Waddell (2018).
  67. ^ Poria et al. (2017).
  68. ^ a b Artificial general intelligence: Russell & Norvig (2021, pp. 32–33, 1020–1021)
    Proposal for the modern version: Pennachin & Goertzel (2007)
    Warnings of overspecialization in AI from leading researchers: Nilsson (1995), McCarthy (2007), Beal & Winston (2009)
  69. ^ Search algorithms: Russell & Norvig (2021, chpts. 3–5), Poole, Mackworth & Goebel (1998, pp. 113–163), Luger & Stubblefield (2004, pp. 79–164, 193–219), Nilsson (1998, chpts. 7–12)
  70. ^ State space search: Russell & Norvig (2021, chpt. 3)
  71. ^ Russell & Norvig (2021), sect. 11.2.
  72. ^ Uninformed searches (breadth first search, depth-first search and general state space search): Russell & Norvig (2021, sect. 3.4), Poole, Mackworth & Goebel (1998, pp. 113–132), Luger & Stubblefield (2004, pp. 79–121), Nilsson (1998, chpt. 8)
  73. ^ Heuristic or informed searches (e.g., greedy best first and A*): Russell & Norvig (2021, sect. 3.5), Poole, Mackworth & Goebel (1998, pp. 132–147), Poole & Mackworth (2017, sect. 3.6), Luger & Stubblefield (2004, pp. 133–150)
  74. ^ Adversarial search: Russell & Norvig (2021, chpt. 5)
  75. ^ Local or "optimization" search: Russell & Norvig (2021, chpt. 4)
  76. ^ Singh Chauhan, Nagesh (18 December 2020). "Optimization Algorithms in Neural Networks". KDnuggets. Retrieved 13 January 2024.
  77. ^ Evolutionary computation: Russell & Norvig (2021, sect. 4.1.2)
  78. ^ Merkle & Middendorf (2013).
  79. ^ Logic: Russell & Norvig (2021, chpts. 6–9), Luger & Stubblefield (2004, pp. 35–77), Nilsson (1998, chpt. 13–16)
  80. ^ Propositional logic: Russell & Norvig (2021, chpt. 6), Luger & Stubblefield (2004, pp. 45–50), Nilsson (1998, chpt. 13)
  81. ^ First-order logic and features such as equality: Russell & Norvig (2021, chpt. 7), Poole, Mackworth & Goebel (1998, pp. 268–275), Luger & Stubblefield (2004, pp. 50–62), Nilsson (1998, chpt. 15)
  82. ^ Logical inference: Russell & Norvig (2021, chpt. 10)
  83. ^ logical deduction as search: Russell & Norvig (2021, sects. 9.3, 9.4), Poole, Mackworth & Goebel (1998, pp. ~46–52), Luger & Stubblefield (2004, pp. 62–73), Nilsson (1998, chpt. 4.2, 7.2)
  84. ^ Resolution and unification: Russell & Norvig (2021, sections 7.5.2, 9.2, 9.5)
  85. ^ Warren, D.H.; Pereira, L.M.; Pereira, F. (1977). "Prolog-the language and its implementation compared with Lisp". ACM SIGPLAN Notices. 12 (8): 109–115. doi:10.1145/872734.806939.
  86. ^ Fuzzy logic: Russell & Norvig (2021, pp. 214, 255, 459), Scientific American (1999)
  87. ^ a b Stochastic methods for uncertain reasoning: Russell & Norvig (2021, chpt. 12–18, 20), Poole, Mackworth & Goebel (1998, pp. 345–395), Luger & Stubblefield (2004, pp. 165–191, 333–381), Nilsson (1998, chpt. 19)
  88. ^ decision theory and decision analysis: Russell & Norvig (2021, chpt. 16–18), Poole, Mackworth & Goebel (1998, pp. 381–394)
  89. ^ Information value theory: Russell & Norvig (2021, sect. 16.6)
  90. ^ Markov decision processes and dynamic decision networks: Russell & Norvig (2021, chpt. 17)
  91. ^ a b c Stochastic temporal models: Russell & Norvig (2021, chpt. 14) Hidden Markov model: Russell & Norvig (2021, sect. 14.3) Kalman filters: Russell & Norvig (2021, sect. 14.4) Dynamic Bayesian networks: Russell & Norvig (2021, sect. 14.5)
  92. ^ Game theory and mechanism design: Russell & Norvig (2021, chpt. 18)
  93. ^ Bayesian networks: Russell & Norvig (2021, sects. 12.5–12.6, 13.4–13.5, 14.3–14.5, 16.5, 20.2–20.3), Poole, Mackworth & Goebel (1998, pp. 361–381), Luger & Stubblefield (2004, pp. ~182–190, ≈363–379), Nilsson (1998, chpt. 19.3–19.4)
  94. ^ Domingos (2015), chpt. 6.
  95. ^ Bayesian inference algorithm: Russell & Norvig (2021, sect. 13.3–13.5), Poole, Mackworth & Goebel (1998, pp. 361–381), Luger & Stubblefield (2004, pp. ~363–379), Nilsson (1998, chpt. 19.4 & 7)
  96. ^ Domingos (2015), p. 210.
  97. ^ Bayesian learning and the expectation–maximization algorithm: Russell & Norvig (2021, chpt. 20), Poole, Mackworth & Goebel (1998, pp. 424–433), Nilsson (1998, chpt. 20), Domingos (2015, p. 210)
  98. ^ Bayesian decision theory and Bayesian decision networks: Russell & Norvig (2021, sect. 16.5)
  99. ^ Statistical learning methods and classifiers: Russell & Norvig (2021, chpt. 20),
  100. ^ Ciaramella, Alberto; Ciaramella, Marco (2024). Introduction to Artificial Intelligence: from data analysis to generative AI. Intellisemantic Editions. ISBN 978-8-8947-8760-3.
  101. ^ Decision trees: Russell & Norvig (2021, sect. 19.3), Domingos (2015, p. 88)
  102. ^ Non-parameteric learning models such as K-nearest neighbor and support vector machines: Russell & Norvig (2021, sect. 19.7), Domingos (2015, p. 187) (k-nearest neighbor)
  103. ^ Domingos (2015), p. 152.
  104. ^ Naive Bayes classifier: Russell & Norvig (2021, sect. 12.6), Domingos (2015, p. 152)
  105. ^ a b Neural networks: Russell & Norvig (2021, chpt. 21), Domingos (2015, Chapter 4)
  106. ^ Gradient calculation in computational graphs, backpropagation, automatic differentiation: Russell & Norvig (2021, sect. 21.2), Luger & Stubblefield (2004, pp. 467–474), Nilsson (1998, chpt. 3.3)
  107. ^ Universal approximation theorem: Russell & Norvig (2021, p. 752) The theorem: Cybenko (1988), Hornik, Stinchcombe & White (1989)
  108. ^ Feedforward neural networks: Russell & Norvig (2021, sect. 21.1)
  109. ^ Perceptrons: Russell & Norvig (2021, pp. 21, 22, 683, 22)
  110. ^ a b Deep learning: Russell & Norvig (2021, chpt. 21), Goodfellow, Bengio & Courville (2016), Hinton et al. (2016), Schmidhuber (2015)
  111. ^ Recurrent neural networks: Russell & Norvig (2021, sect. 21.6)
  112. ^ Convolutional neural networks: Russell & Norvig (2021, sect. 21.3)
  113. ^ Sindhu V, Nivedha S, Prakash M (February 2020). "An Empirical Science Research on Bioinformatics in Machine Learning". Journal of Mechanics of Continua and Mathematical Sciences (7). doi:10.26782/jmcms.spl.7/2020.02.00006.
  114. ^ Deng & Yu (2014), pp. 199–200.
  115. ^ Ciresan, Meier & Schmidhuber (2012).
  116. ^ Russell & Norvig (2021), p. 750.
  117. ^ a b c Russell & Norvig (2021), p. 17.
  118. ^ a b c d e f g Russell & Norvig (2021), p. 785.
  119. ^ a b Schmidhuber (2022), sect. 5.
  120. ^ Schmidhuber (2022), sect. 6.
  121. ^ a b c Schmidhuber (2022), sect. 7.
  122. ^ Schmidhuber (2022), sect. 8.
  123. ^ Quoted in Christian (2020, p. 22)
  124. ^ Metz, Cade; Weise, Karen (5 May 2025). "A.I. Hallucinations Are Getting Worse, Even as New Systems Become More Powerful". The New York Times. ISSN 0362-4331. Retrieved 6 May 2025.
  125. ^ Smith (2023).
  126. ^ "Explained: Generative AI". 9 November 2023.
  127. ^ "AI Writing and Content Creation Tools". MIT Sloan Teaching & Learning Technologies. Archived from the original on 25 December 2023. Retrieved 25 December 2023.
  128. ^ Marmouyet (2023).
  129. ^ Kobielus (2019).
  130. ^ Thomason, James (21 May 2024). "Mojo Rising: The resurgence of AI-first programming languages". VentureBeat. Archived from the original on 27 June 2024. Retrieved 26 May 2024.
  131. ^ Wodecki, Ben (5 May 2023). "7 AI Programming Languages You Need to Know". AI Business. Archived from the original on 25 July 2024. Retrieved 5 October 2024.
  132. ^ Plumb, Taryn (18 September 2024). "Why Jensen Huang and Marc Benioff see 'gigantic' opportunity for agentic AI". VentureBeat. Archived from the original on 5 October 2024. Retrieved 4 October 2024.
  133. ^ Mims, Christopher (19 September 2020). "Huang's Law Is the New Moore's Law, and Explains Why Nvidia Wants Arm". Wall Street Journal. ISSN 0099-9660. Archived from the original on 2 October 2023. Retrieved 19 January 2025.
  134. ^ Davenport, T; Kalakota, R (June 2019). "The potential for artificial intelligence in healthcare". Future Healthc J. 6 (2): 94–98. doi:10.7861/futurehosp.6-2-94. PMC 6616181. PMID 31363513.
  135. ^ Lyakhova, U.A.; Lyakhov, P.A. (2024). "Systematic review of approaches to detection and classification of skin cancer using artificial intelligence: Development and prospects". Computers in Biology and Medicine. 178: 108742. doi:10.1016/j.compbiomed.2024.108742. PMID 38875908. Archived from the original on 3 December 2024. Retrieved 10 October 2024.
  136. ^ Alqudaihi, Kawther S.; Aslam, Nida; Khan, Irfan Ullah; Almuhaideb, Abdullah M.; Alsunaidi, Shikah J.; Ibrahim, Nehad M. Abdel Rahman; Alhaidari, Fahd A.; Shaikh, Fatema S.; Alsenbel, Yasmine M.; Alalharith, Dima M.; Alharthi, Hajar M.; Alghamdi, Wejdan M.; Alshahrani, Mohammed S. (2021). "Cough Sound Detection and Diagnosis Using Artificial Intelligence Techniques: Challenges and Opportunities". IEEE Access. 9: 102327–102344. Bibcode:2021IEEEA...9j2327A. doi:10.1109/ACCESS.2021.3097559. ISSN 2169-3536. PMC 8545201. PMID 34786317.
  137. ^ a b Bax, Monique; Thorpe, Jordan; Romanov, Valentin (December 2023). "The future of personalized cardiovascular medicine demands 3D and 4D printing, stem cells, and artificial intelligence". Frontiers in Sensors. 4. doi:10.3389/fsens.2023.1294721. ISSN 2673-5067.
  138. ^ Dankwa-Mullan, Irene (2024). "Health Equity and Ethical Considerations in Using Artificial Intelligence in Public Health and Medicine". Preventing Chronic Disease. 21: E64. doi:10.5888/pcd21.240245. ISSN 1545-1151. PMC 11364282. PMID 39173183.
  139. ^ Jumper, J; Evans, R; Pritzel, A (2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. PMC 8371605. PMID 34265844.
  140. ^ "AI discovers new class of antibiotics to kill drug-resistant bacteria". 20 December 2023. Archived from the original on 16 September 2024. Retrieved 5 October 2024.
  141. ^ "AI speeds up drug design for Parkinson's ten-fold". Cambridge University. 17 April 2024. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  142. ^ Horne, Robert I.; Andrzejewska, Ewa A.; Alam, Parvez; Brotzakis, Z. Faidon; Srivastava, Ankit; Aubert, Alice; Nowinska, Magdalena; Gregory, Rebecca C.; Staats, Roxine; Possenti, Andrea; Chia, Sean; Sormanni, Pietro; Ghetti, Bernardino; Caughey, Byron; Knowles, Tuomas P. J.; Vendruscolo, Michele (17 April 2024). "Discovery of potent inhibitors of α-synuclein aggregation using structure-based iterative learning". Nature Chemical Biology. 20 (5). Nature: 634–645. doi:10.1038/s41589-024-01580-x. PMC 11062903. PMID 38632492.
  143. ^ Grant, Eugene F.; Lardner, Rex (25 July 1952). "The Talk of the Town – It". The New Yorker. ISSN 0028-792X. Archived from the original on 16 February 2020. Retrieved 28 January 2024.
  144. ^ Anderson, Mark Robert (11 May 2017). "Twenty years on from Deep Blue vs Kasparov: how a chess match started the big data revolution". The Conversation. Archived from the original on 17 September 2024. Retrieved 28 January 2024.
  145. ^ Markoff, John (16 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". The New York Times. ISSN 0362-4331. Archived from the original on 22 October 2014. Retrieved 28 January 2024.
  146. ^ Byford, Sam (27 May 2017). "AlphaGo retires from competitive Go after defeating world number one 3–0". The Verge. Archived from the original on 7 June 2017. Retrieved 28 January 2024.
  147. ^ Brown, Noam; Sandholm, Tuomas (30 August 2019). "Superhuman AI for multiplayer poker". Science. 365 (6456): 885–890. Bibcode:2019Sci...365..885B. doi:10.1126/science.aay2400. ISSN 0036-8075. PMID 31296650.
  148. ^ "MuZero: Mastering Go, chess, shogi and Atari without rules". Google DeepMind. 23 December 2020. Retrieved 28 January 2024.
  149. ^ Sample, Ian (30 October 2019). "AI becomes grandmaster in 'fiendishly complex' StarCraft II". The Guardian. ISSN 0261-3077. Archived from the original on 29 December 2020. Retrieved 28 January 2024.
  150. ^ Wurman, P. R.; Barrett, S.; Kawamoto, K. (2022). "Outracing champion Gran Turismo drivers with deep reinforcement learning" (PDF). Nature. 602 (7896): 223–228. Bibcode:2022Natur.602..223W. doi:10.1038/s41586-021-04357-7. PMID 35140384.
  151. ^ Wilkins, Alex (13 March 2024). "Google AI learns to play open-world video games by watching them". New Scientist. Archived from the original on 26 July 2024. Retrieved 21 July 2024.
  152. ^ Wu, Zhengxuan; Arora, Aryaman; Wang, Zheng; Geiger, Atticus; Jurafsky, Dan; Manning, Christopher D.; Potts, Christopher (2024). "ReFT: Representation Finetuning for Language Models". NeurIPS. arXiv:2404.03592.
  153. ^ "Improving mathematical reasoning with process supervision". OpenAI. 31 May 2023. Retrieved 26 January 2025.
  154. ^ Srivastava, Saurabh (29 February 2024). "Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap". arXiv:2402.19450 [cs.AI].
  155. ^ Lightman, Hunter; Kosaraju, Vineet; Burda, Yura; Edwards, Harri; Baker, Bowen; Lee, Teddy; Leike, Jan; Schulman, John; Sutskever, Ilya; Cobbe, Karl (2023). "Let's Verify Step by Step". arXiv:2305.20050v1 [cs.LG].
  156. ^ Franzen, Carl (8 August 2024). "Alibaba claims no. 1 spot in AI math models with Qwen2-Math". VentureBeat. Retrieved 16 February 2025.
  157. ^ Franzen, Carl (9 January 2025). "Microsoft's new rStar-Math technique upgrades small models to outperform OpenAI's o1-preview at math problems". VentureBeat. Retrieved 26 January 2025.
  158. ^ Gina Genkina: New AI Model Advances the “Kissing Problem” and More. AlphaEvolve made several mathematical discoveries and practical optimizations IEEE Spectrum 2025-05-14. Retrieved 2025-06-07
  159. ^ Roberts, Siobhan (25 July 2024). "AI achieves silver-medal standard solving International Mathematical Olympiad problems". The New York Times. Archived from the original on 26 September 2024. Retrieved 7 August 2024.
  160. ^ Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer', Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (16 October 2023). "Llemma: An Open Language Model For Mathematics". EleutherAI Blog. Retrieved 26 January 2025.
  161. ^ "Julius AI". julius.ai.
  162. ^ Metz, Cade (21 July 2025). "Google A.I. System Wins Gold Medal in International Math Olympiad". The New York Times. ISSN 0362-4331. Retrieved 24 July 2025.
  163. ^ McFarland, Alex (12 July 2024). "8 Best AI for Math Tools (January 2025)". Unite.AI. Retrieved 26 January 2025.
  164. ^ Matthew Finio & Amanda Downie: IBM Think 2024 Primer, "What is Artificial Intelligence (AI) in Finance?" 8 Dec. 2023
  165. ^ M. Nicolas, J. Firzli: Pensions Age / European Pensions magazine, "Artificial Intelligence: Ask the Industry", May–June 2024. https://videovoice.org/ai-in-finance-innovation-entrepreneurship-vs-over-regulation-with-the-eus-artificial-intelligence-act-wont-work-as-intended/ Archived 11 September 2024 at the Wayback Machine.
  166. ^ a b c Congressional Research Service (2019). Artificial Intelligence and National Security (PDF). Washington, DC: Congressional Research Service. Archived (PDF) from the original on 8 May 2020. Retrieved 25 February 2024.PD-notice
  167. ^ a b Slyusar, Vadym (2019). Artificial intelligence as the basis of future control networks (Preprint). doi:10.13140/RG.2.2.30247.50087.
  168. ^ Iraqi, Amjad (3 April 2024). "'Lavender': The AI machine directing Israel's bombing spree in Gaza". +972 Magazine. Archived from the original on 10 October 2024. Retrieved 6 April 2024.
  169. ^ Davies, Harry; McKernan, Bethan; Sabbagh, Dan (1 December 2023). "'The Gospel': how Israel uses AI to select bombing targets in Gaza". The Guardian. Archived from the original on 6 December 2023. Retrieved 4 December 2023.
  170. ^ Marti, J Werner (10 August 2024). "Drohnen haben den Krieg in der Ukraine revolutioniert, doch sie sind empfindlich auf Störsender – deshalb sollen sie jetzt autonom operieren". Neue Zürcher Zeitung (in German). Archived from the original on 10 August 2024. Retrieved 10 August 2024.
  171. ^ Newsom, Gavin; Weber, Shirley N. (5 September 2023). "Executive Order N-12-23" (PDF). Executive Department, State of California. Archived (PDF) from the original on 21 February 2024. Retrieved 7 September 2023.
  172. ^ Pinaya, Walter H. L.; Graham, Mark S.; Kerfoot, Eric; Tudosiu, Petru-Daniel; Dafflon, Jessica; Fernandez, Virginia; Sanchez, Pedro; Wolleb, Julia; da Costa, Pedro F.; Patel, Ashay (2023). "Generative AI for Medical Imaging: extending the MONAI Framework". arXiv:2307.15208 [eess.IV].
  173. ^ "What is ChatGPT, DALL-E, and generative AI?". McKinsey. Archived from the original on 23 April 2023. Retrieved 14 December 2024.
  174. ^ "What is generative AI?". IBM. 22 March 2024. Archived from the original on 13 December 2024. Retrieved 13 December 2024.
  175. ^ Pasick, Adam (27 March 2023). "Artificial Intelligence Glossary: Neural Networks and Other Terms Explained". The New York Times. ISSN 0362-4331. Archived from the original on 1 September 2023. Retrieved 22 April 2023.
  176. ^ Karpathy, Andrej; Abbeel, Pieter; Brockman, Greg; Chen, Peter; Cheung, Vicki; Duan, Yan; Goodfellow, Ian; Kingma, Durk; Ho, Jonathan; Rein Houthooft; Tim Salimans; John Schulman; Ilya Sutskever; Wojciech Zaremba (16 June 2016). "Generative models". OpenAI. Archived from the original on 17 November 2023. Retrieved 15 March 2023.
  177. ^ a b Griffith, Erin; Metz, Cade (27 January 2023). "Anthropic Said to Be Closing In on $300 Million in New A.I. Funding". The New York Times. Archived from the original on 9 December 2023. Retrieved 14 March 2023.
  178. ^ Lanxon, Nate; Bass, Dina; Davalos, Jackie (10 March 2023). "A Cheat Sheet to AI Buzzwords and Their Meanings". Bloomberg News. Archived from the original on 17 November 2023. Retrieved 14 March 2023.
  179. ^ Metz, Cade (14 March 2023). "OpenAI Plans to Up the Ante in Tech's A.I. Race". The New York Times. ISSN 0362-4331. Archived from the original on 31 March 2023. Retrieved 31 March 2023.
  180. ^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv (20 January 2022). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
  181. ^ Roose, Kevin (21 October 2022). "A Coming-Out Party for Generative A.I., Silicon Valley's New Craze". The New York Times. Archived from the original on 15 February 2023. Retrieved 14 March 2023.
  182. ^ Metz, Cade (15 February 2024). "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos". The New York Times. ISSN 0362-4331. Archived from the original on 15 February 2024. Retrieved 16 February 2024.
  183. ^ Fink, Charlie. "LTX Video Breaks The 60-Second Barrier, Redefining AI Video As A Longform Medium". Forbes. Retrieved 24 July 2025.
  184. ^ "The race of the AI labs heats up". The Economist. 30 January 2023. Archived from the original on 17 November 2023. Retrieved 14 March 2023.
  185. ^ Yang, June; Gokturk, Burak (14 March 2023). "Google Cloud brings generative AI to developers, businesses, and governments". Archived from the original on 17 November 2023. Retrieved 15 March 2023.
  186. ^ Taeihagh, Araz (4 April 2025). "Governance of Generative AI". Policy and Society. 44 (1): 1–22. doi:10.1093/polsoc/puaf001. ISSN 1449-4035.
  187. ^ Simon, Felix M.; Altay, Sacha; Mercier, Hugo (18 October 2023). "Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown" (PDF). Harvard Kennedy School Misinformation Review. doi:10.37016/mr-2020-127. S2CID 264113883. Retrieved 16 November 2023.
  188. ^ Hendrix, Justin (16 May 2023). "Transcript: Senate Judiciary Subcommittee Hearing on Oversight of AI". techpolicy.press. Archived from the original on 17 November 2023. Retrieved 19 May 2023.
  189. ^ "New AI systems collide with copyright law". BBC News. 1 August 2023. Retrieved 28 September 2024.
  190. ^ Poole, David; Mackworth, Alan (2023). Artificial Intelligence, Foundations of Computational Agents (3rd ed.). Cambridge University Press. doi:10.1017/9781009258227. ISBN 978-1-0092-5819-7. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  191. ^ Russell, Stuart; Norvig, Peter (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. ISBN 978-0-1346-1099-3.
  192. ^ "Why agents are the next frontier of generative AI". McKinsey Digital. 24 July 2024. Archived from the original on 3 October 2024. Retrieved 10 August 2024.
  193. ^ Figueiredo, Mayara Costa; Ankrah, Elizabeth; Powell, Jacquelyn E.; Epstein, Daniel A.; Chen, Yunan (12 January 2024). "Powered by AI: Examining How AI Descriptions Influence Perceptions of Fertility Tracking Applications". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 7 (4): 1–24. doi:10.1145/3631414.
  194. ^ Power, Jennifer; Pym, Tinonee; James, Alexandra; Waling, Andrea (5 July 2024). "Smart Sex Toys: A Narrative Review of Recent Research on Cultural, Health and Safety Considerations". Current Sexual Health Reports. 16 (3): 199–215. doi:10.1007/s11930-024-00392-3. ISSN 1548-3592.
  195. ^ Marcantonio, Tiffany L.; Avery, Gracie; Thrash, Anna; Leone, Ruschelle M. (10 September 2024). "Large Language Models in an App: Conducting a Qualitative Synthetic Data Analysis of How Snapchat's "My AI" Responds to Questions About Sexual Consent, Sexual Refusals, Sexual Assault, and Sexting". The Journal of Sex Research: 1–15. doi:10.1080/00224499.2024.2396457. ISSN 0022-4499. PMC 11891083. PMID 39254628. Archived from the original on 9 December 2024. Retrieved 9 December 2024.
  196. ^ Hanson, Kenneth R.; Bolthouse, Hannah (2024). ""Replika Removing Erotic Role-Play Is Like Grand Theft Auto Removing Guns or Cars": Reddit Discourse on Artificial Intelligence Chatbots and Sexual Technologies". Socius: Sociological Research for a Dynamic World. 10. doi:10.1177/23780231241259627. ISSN 2378-0231.
  197. ^ Mania, Karolina (1 January 2024). "Legal Protection of Revenge and Deepfake Porn Victims in the European Union: Findings From a Comparative Legal Study". Trauma, Violence, & Abuse. 25 (1): 117–129. doi:10.1177/15248380221143772. ISSN 1524-8380. PMID 36565267.
  198. ^ Singh, Suyesha; Nambiar, Vaishnavi (2024). "Role of Artificial Intelligence in the Prevention of Online Child Sexual Abuse: A Systematic Review of Literature". Journal of Applied Security Research. 19 (4): 586–627. doi:10.1080/19361610.2024.2331885. ISSN 1936-1610. Archived from the original on 9 December 2024. Retrieved 9 December 2024.
  199. ^ Razi, Afsaneh; Kim, Seunghyun; Alsoubai, Ashwaq; Stringhini, Gianluca; Solorio, Thamar; De Choudhury, Munmun; Wisniewski, Pamela J. (13 October 2021). "A Human-Centered Systematic Literature Review of the Computational Approaches for Online Sexual Risk Detection". Proceedings of the ACM on Human-Computer Interaction. 5 (CSCW2): 1–38. doi:10.1145/3479609. ISSN 2573-0142. Archived from the original on 9 December 2024. Retrieved 9 December 2024.
  200. ^ Ransbotham, Sam; Kiron, David; Gerbert, Philipp; Reeves, Martin (6 September 2017). "Reshaping Business With Artificial Intelligence". MIT Sloan Management Review. Archived from the original on 13 February 2024.
  201. ^ Sun, Yuran; Zhao, Xilei; Lovreglio, Ruggiero; Kuligowski, Erica (1 January 2024), Naser, M. Z. (ed.), "8 – AI for large-scale evacuation modeling: promises and challenges", Interpretable Machine Learning for the Analysis, Design, Assessment, and Informed Decision Making for Civil Infrastructure, Woodhead Publishing Series in Civil and Structural Engineering, Woodhead Publishing, pp. 185–204, ISBN 978-0-1282-4073-1, archived from the original on 19 May 2024, retrieved 28 June 2024.
  202. ^ Gomaa, Islam; Adelzadeh, Masoud; Gwynne, Steven; Spencer, Bruce; Ko, Yoon; Bénichou, Noureddine; Ma, Chunyun; Elsagan, Nour; Duong, Dana; Zalok, Ehab; Kinateder, Max (1 November 2021). "A Framework for Intelligent Fire Detection and Evacuation System". Fire Technology. 57 (6): 3179–3185. doi:10.1007/s10694-021-01157-3. ISSN 1572-8099. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  203. ^ Zhao, Xilei; Lovreglio, Ruggiero; Nilsson, Daniel (1 May 2020). "Modelling and interpreting pre-evacuation decision-making using machine learning". Automation in Construction. 113: 103140. doi:10.1016/j.autcon.2020.103140. hdl:10179/17315. ISSN 0926-5805. Archived from the original on 19 May 2024. Retrieved 5 October 2024.
  204. ^ "India's latest election embraced AI technology. Here are some ways it was used constructively". PBS News. 12 June 2024. Archived from the original on 17 September 2024. Retrieved 28 October 2024.
  205. ^ "Экономист Дарон Асемоглу написал книгу об угрозах искусственного интеллекта — и о том, как правильное управление может обратить его на пользу человечеству Спецкор "Медузы" Маргарита Лютова узнала у ученого, как скоро мир сможет приблизиться к этой утопии". Meduza (in Russian). Archived from the original on 20 June 2023. Retrieved 21 June 2023.
  206. ^ "Learning, thinking, artistic collaboration and other such human endeavours in the age of AI". The Hindu. 2 June 2023. Archived from the original on 21 June 2023. Retrieved 21 June 2023.
  207. ^ Müller, Vincent C. (30 April 2020). "Ethics of Artificial Intelligence and Robotics". Stanford Encyclopedia of Philosophy Archive. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  208. ^ Simonite (2016).
  209. ^ Russell & Norvig (2021), p. 987.
  210. ^ "Assessing potential future artificial intelligence risks, benefits and policy imperatives". OECD. 14 November 2024. Retrieved 1 August 2025.
  211. ^ Laskowski (2023).
  212. ^ GAO (2022).
  213. ^ Valinsky (2019).
  214. ^ Russell & Norvig (2021), p. 991.
  215. ^ Russell & Norvig (2021), pp. 991–992.
  216. ^ Christian (2020), p. 63.
  217. ^ Vincent (2022).
  218. ^ Kopel, Matthew. "Copyright Services: Fair Use". Cornell University Library. Archived from the original on 26 September 2024. Retrieved 26 April 2024.
  219. ^ Burgess, Matt. "How to Stop Your Data From Being Used to Train AI". Wired. ISSN 1059-1028. Archived from the original on 3 October 2024. Retrieved 26 April 2024.
  220. ^ Reisner (2023).
  221. ^ Alter & Harris (2023).
  222. ^ "Getting the Innovation Ecosystem Ready for AI. An IP policy toolkit" (PDF). WIPO.
  223. ^ Hammond, George (27 December 2023). "Big Tech is spending more than VC firms on AI startups". Ars Technica. Archived from the original on 10 January 2024.
  224. ^ Wong, Matteo (24 October 2023). "The Future of AI Is GOMA". The Atlantic. Archived from the original on 5 January 2024.
  225. ^ "Big tech and the pursuit of AI dominance". The Economist. 26 March 2023. Archived from the original on 29 December 2023.
  226. ^ Fung, Brian (19 December 2023). "Where the battle to dominate AI may be won". CNN Business. Archived from the original on 13 January 2024.
  227. ^ Metz, Cade (5 July 2023). "In the Age of A.I., Tech's Little Guys Need Big Friends". The New York Times. Archived from the original on 8 July 2024. Retrieved 5 October 2024.
  228. ^ "Electricity 2024 – Analysis". IEA. 24 January 2024. Retrieved 13 July 2024.
  229. ^ Calvert, Brian (28 March 2024). "AI already uses as much energy as a small country. It's only the beginning". Vox. New York, New York. Archived from the original on 3 July 2024. Retrieved 5 October 2024.
  230. ^ Halper, Evan; O'Donovan, Caroline (21 June 2024). "AI is exhausting the power grid. Tech firms are seeking a miracle solution". Washington Post.
  231. ^ Davenport, Carly. "AI Data Centers and the Coming YS Power Demand Surge" (PDF). Goldman Sachs. Archived from the original (PDF) on 26 July 2024. Retrieved 5 October 2024.
  232. ^ Ryan, Carol (12 April 2024). "Energy-Guzzling AI Is Also the Future of Energy Savings". Wall Street Journal. Dow Jones.
  233. ^ Hiller, Jennifer (1 July 2024). "Tech Industry Wants to Lock Up Nuclear Power for AI". Wall Street Journal. Dow Jones. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  234. ^ Kendall, Tyler (28 September 2024). "Nvidia's Huang Says Nuclear Power an Option to Feed Data Centers". Bloomberg.
  235. ^ Halper, Evan (20 September 2024). "Microsoft deal would reopen Three Mile Island nuclear plant to power AI". Washington Post.
  236. ^ Hiller, Jennifer (20 September 2024). "Three Mile Island's Nuclear Plant to Reopen, Help Power Microsoft's AI Centers". Wall Street Journal. Dow Jones. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  237. ^ a b c Niva Yadav (19 August 2024). "Taiwan to stop large data centers in the North, cites insufficient power". DatacenterDynamics. Archived from the original on 8 November 2024. Retrieved 7 November 2024.
  238. ^ a b Mochizuki, Takashi; Oda, Shoko (18 October 2024). "エヌビディア出資の日本企業、原発近くでAIデータセンター新設検討". Bloomberg (in Japanese). Archived from the original on 8 November 2024. Retrieved 7 November 2024.
  239. ^ a b Naureen S Malik and Will Wade (5 November 2024). "Nuclear-Hungry AI Campuses Need New Plan to Find Power Fast". Bloomberg.
  240. ^ "Energy and AI Executive summary". International Energy Agency. Retrieved 10 April 2025.
  241. ^ Nicas (2018).
  242. ^ Rainie, Lee; Keeter, Scott; Perrin, Andrew (22 July 2019). "Trust and Distrust in America". Pew Research Center. Archived from the original on 22 February 2024.
  243. ^ Kosoff, Maya (8 February 2018). "YouTube Struggles to Contain Its Conspiracy Problem". Vanity Fair. Retrieved 10 April 2025.
  244. ^ Berry, David M. (19 March 2025). "Synthetic media and computational capitalism: towards a critical theory of artificial intelligence". AI & Society. doi:10.1007/s00146-025-02265-2. ISSN 1435-5655.
  245. ^ "Unreal: A quantum leap in AI video". The Week. 17 June 2025. Retrieved 20 June 2025.
  246. ^ Snow, Jackie. "AI video is getting real. Beware what comes next". Quartz. Retrieved 20 June 2025.
  247. ^ Chow, Andrew R.; Perrigo, Billy (3 June 2025). "Google's New AI Tool Generates Convincing Deepfakes of Riots, Conflict, and Election Fraud". Time. Retrieved 20 June 2025.
  248. ^ Williams (2023).
  249. ^ Olanipekun, Samson Olufemi (2025). "Computational propaganda and misinformation: AI technologies as tools of media manipulation". World Journal of Advanced Research and Reviews. 25 (1): 911–923. doi:10.30574/wjarr.2025.25.1.0131. ISSN 2581-9615.
  250. ^ Taylor & Hern (2023).
  251. ^ "To fight AI, we need 'personhood credentials,' say AI firms". Archived from the original on 24 April 2025. Retrieved 9 May 2025.
  252. ^ a b Samuel, Sigal (19 April 2022). "Why it's so damn hard to make AI fair and unbiased". Vox. Archived from the original on 5 October 2024. Retrieved 24 July 2024.
  253. ^ a b Rose (2023).
  254. ^ CNA (2019).
  255. ^ Goffrey (2008), p. 17.
  256. ^ Berdahl et al. (2023); Goffrey (2008, p. 17); Rose (2023); Russell & Norvig (2021, p. 995)
  257. ^ Christian (2020), p. 25.
  258. ^ a b Russell & Norvig (2021), p. 995.
  259. ^ Grant & Hill (2023).
  260. ^ Larson & Angwin (2016).
  261. ^ Christian (2020), p. 67–70.
  262. ^ Christian (2020, pp. 67–70); Russell & Norvig (2021, pp. 993–994)
  263. ^ Russell & Norvig (2021, p. 995); Lipartito (2011, p. 36); Goodman & Flaxman (2017, p. 6); Christian (2020, pp. 39–40, 65)
  264. ^ Quoted in Christian (2020, p. 65).
  265. ^ Russell & Norvig (2021, p. 994); Christian (2020, pp. 40, 80–81)
  266. ^ Quoted in Christian (2020, p. 80)
  267. ^ Dockrill (2022).
  268. ^ Sample (2017).
  269. ^ "Black Box AI". 16 June 2023. Archived from the original on 15 June 2024. Retrieved 5 October 2024.
  270. ^ Christian (2020), p. 110.
  271. ^ Christian (2020), pp. 88–91.
  272. ^ Christian (2020, p. 83); Russell & Norvig (2021, p. 997)
  273. ^ Christian (2020), p. 91.
  274. ^ Christian (2020), p. 83.
  275. ^ Verma (2021).
  276. ^ Rothman (2020).
  277. ^ Christian (2020), pp. 105–108.
  278. ^ Christian (2020), pp. 108–112.
  279. ^ Ropek, Lucas (21 May 2024). "New Anthropic Research Sheds Light on AI's 'Black Box'". Gizmodo. Archived from the original on 5 October 2024. Retrieved 23 May 2024.
  280. ^ Russell & Norvig (2021), p. 989.
  281. ^ a b Russell & Norvig (2021), pp. 987–990.
  282. ^ Russell & Norvig (2021), p. 988.
  283. ^ Robitzski (2018); Sainato (2015)
  284. ^ Harari (2018).
  285. ^ Buckley, Chris; Mozur, Paul (22 May 2019). "How China Uses High-Tech Surveillance to Subdue Minorities". The New York Times. Archived from the original on 25 November 2019. Retrieved 2 July 2019.
  286. ^ "Security lapse exposed a Chinese smart city surveillance system". 3 May 2019. Archived from the original on 7 March 2021. Retrieved 14 September 2020.
  287. ^ Urbina et al. (2022).
  288. ^ a b E. McGaughey, 'Will Robots Automate Your Job Away? Full Employment, Basic Income, and Economic Democracy' (2022), 51(3) Industrial Law Journal 511–559. Archived 27 May 2023 at the Wayback Machine.
  289. ^ Ford & Colvin (2015);McGaughey (2022)
  290. ^ IGM Chicago (2017).
  291. ^ Arntz, Gregory & Zierahn (2016), p. 33.
  292. ^ Lohr (2017); Frey & Osborne (2017); Arntz, Gregory & Zierahn (2016, p. 33)
  293. ^ Zhou, Viola (11 April 2023). "AI is already taking video game illustrators' jobs in China". Rest of World. Archived from the original on 21 February 2024. Retrieved 17 August 2023.
  294. ^ Carter, Justin (11 April 2023). "China's game art industry reportedly decimated by growing AI use". Game Developer. Archived from the original on 17 August 2023. Retrieved 17 August 2023.
  295. ^ Morgenstern (2015).
  296. ^ Mahdawi (2017); Thompson (2014)
  297. ^ Tarnoff, Ben (4 August 2023). "Lessons from Eliza". The Guardian Weekly. pp. 34–39.
  298. ^ Cellan-Jones (2014).
  299. ^ Russell & Norvig 2021, p. 1001.
  300. ^ Bostrom (2014).
  301. ^ Russell (2019).
  302. ^ Bostrom (2014); Müller & Bostrom (2014); Bostrom (2015).
  303. ^ Harari (2023).
  304. ^ Müller & Bostrom (2014).
  305. ^ Leaders' concerns about the existential risks of AI around 2015: Rawlinson (2015), Holley (2015), Gibbs (2014), Sainato (2015)
  306. ^ ""Godfather of artificial intelligence" talks impact and potential of new AI". CBS News. 25 March 2023. Archived from the original on 28 March 2023. Retrieved 28 March 2023.
  307. ^ Pittis, Don (4 May 2023). "Canadian artificial intelligence leader Geoffrey Hinton piles on fears of computer takeover". CBC. Archived from the original on 7 July 2024. Retrieved 5 October 2024.
  308. ^ "'50–50 chance' that AI outsmarts humanity, Geoffrey Hinton says". Bloomberg BNN. 14 June 2024. Archived from the original on 14 June 2024. Retrieved 6 July 2024.
  309. ^ Valance (2023).
  310. ^ Taylor, Josh (7 May 2023). "Rise of artificial intelligence is inevitable but should not be feared, 'father of AI' says". The Guardian. Archived from the original on 23 October 2023. Retrieved 26 May 2023.
  311. ^ Colton, Emma (7 May 2023). "'Father of AI' says tech fears misplaced: 'You cannot stop it'". Fox News. Archived from the original on 26 May 2023. Retrieved 26 May 2023.
  312. ^ Jones, Hessie (23 May 2023). "Juergen Schmidhuber, Renowned 'Father Of Modern AI,' Says His Life's Work Won't Lead To Dystopia". Forbes. Archived from the original on 26 May 2023. Retrieved 26 May 2023.
  313. ^ McMorrow, Ryan (19 December 2023). "Andrew Ng: 'Do we think the world is better off with more or less intelligence?'". Financial Times. Archived from the original on 25 January 2024. Retrieved 30 December 2023.
  314. ^ Levy, Steven (22 December 2023). "How Not to Be Stupid About AI, With Yann LeCun". Wired. Archived from the original on 28 December 2023. Retrieved 30 December 2023.
  315. ^ Arguments that AI is not an imminent risk: Brooks (2014), Geist (2015), Madrigal (2015), Lee (2014)
  316. ^ a b Christian (2020), pp. 67, 73.
  317. ^ Yudkowsky (2008).
  318. ^ a b Anderson & Anderson (2011).
  319. ^ AAAI (2014).
  320. ^ Wallach (2010).
  321. ^ Russell (2019), p. 173.
  322. ^ Stewart, Ashley; Melton, Monica. "Hugging Face CEO says he's focused on building a 'sustainable model' for the $4.5 billion open-source-AI startup". Business Insider. Archived from the original on 25 September 2024. Retrieved 14 April 2024.
  323. ^ Wiggers, Kyle (9 April 2024). "Google open sources tools to support AI model development". TechCrunch. Archived from the original on 10 September 2024. Retrieved 14 April 2024.
  324. ^ Heaven, Will Douglas (12 May 2023). "The open-source AI boom is built on Big Tech's handouts. How long will it last?". MIT Technology Review. Retrieved 14 April 2024.
  325. ^ Brodsky, Sascha (19 December 2023). "Mistral AI's New Language Model Aims for Open Source Supremacy". AI Business. Archived from the original on 5 September 2024. Retrieved 5 October 2024.
  326. ^ Edwards, Benj (22 February 2024). "Stability announces Stable Diffusion 3, a next-gen AI image generator". Ars Technica. Archived from the original on 5 October 2024. Retrieved 14 April 2024.
  327. ^ Marshall, Matt (29 January 2024). "How enterprises are using open source LLMs: 16 examples". VentureBeat. Archived from the original on 26 September 2024. Retrieved 5 October 2024.
  328. ^ Piper, Kelsey (2 February 2024). "Should we make our most powerful AI models open source to all?". Vox. Archived from the original on 5 October 2024. Retrieved 14 April 2024.
  329. ^ Alan Turing Institute (2019). "Understanding artificial intelligence ethics and safety" (PDF). Archived (PDF) from the original on 11 September 2024. Retrieved 5 October 2024.
  330. ^ Alan Turing Institute (2023). "AI Ethics and Governance in Practice" (PDF). Archived (PDF) from the original on 11 September 2024. Retrieved 5 October 2024.
  331. ^ Floridi, Luciano; Cowls, Josh (23 June 2019). "A Unified Framework of Five Principles for AI in Society". Harvard Data Science Review. 1 (1). doi:10.1162/99608f92.8cd550d1. S2CID 198775713. Archived from the original on 7 August 2019. Retrieved 5 December 2023.
  332. ^ Buruk, Banu; Ekmekci, Perihan Elif; Arda, Berna (1 September 2020). "A critical perspective on guidelines for responsible and trustworthy artificial intelligence". Medicine, Health Care and Philosophy. 23 (3): 387–399. doi:10.1007/s11019-020-09948-1. ISSN 1572-8633. PMID 32236794. S2CID 214766800. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  333. ^ Kamila, Manoj Kumar; Jasrotia, Sahil Singh (1 January 2023). "Ethical issues in the development of artificial intelligence: recognizing the risks". International Journal of Ethics and Systems. 41 (ahead-of-print): 45–63. doi:10.1108/IJOES-05-2023-0107. ISSN 2514-9369. S2CID 259614124. Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  334. ^ "AI Safety Institute releases new AI safety evaluations platform". UK Government. 10 May 2024. Archived from the original on 5 October 2024. Retrieved 14 May 2024.
  335. ^ Regulation of AI to mitigate risks: Berryhill et al. (2019), Barfield & Pagallo (2018), Iphofen & Kritikos (2019), Wirtz, Weyerer & Geyer (2018), Buiten (2019)
  336. ^ a b Vincent (2023).
  337. ^ Stanford University (2023).
  338. ^ a b c d UNESCO (2021).
  339. ^ Kissinger (2021).
  340. ^ Altman, Brockman & Sutskever (2023).
  341. ^ VOA News (25 October 2023). "UN Announces Advisory Body on Artificial Intelligence". Archived from the original on 18 September 2024. Retrieved 5 October 2024.
  342. ^ "Council of Europe opens first ever global treaty on AI for signature". Council of Europe. 5 September 2024. Archived from the original on 17 September 2024. Retrieved 17 September 2024.
  343. ^ Edwards (2023).
  344. ^ Kasperowicz (2023).
  345. ^ Fox News (2023).
  346. ^ Milmo, Dan (3 November 2023). "Hope or Horror? The great AI debate dividing its pioneers". The Guardian Weekly. pp. 10–12.
  347. ^ "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". GOV.UK. 1 November 2023. Archived from the original on 1 November 2023. Retrieved 2 November 2023.
  348. ^ "Countries agree to safe and responsible development of frontier AI in landmark Bletchley Declaration". GOV.UK (Press release). Archived from the original on 1 November 2023. Retrieved 1 November 2023.
  349. ^ "Second global AI summit secures safety commitments from companies". Reuters. 21 May 2024. Retrieved 23 May 2024.
  350. ^ "Frontier AI Safety Commitments, AI Seoul Summit 2024". gov.uk. 21 May 2024. Archived from the original on 23 May 2024. Retrieved 23 May 2024.
  351. ^ a b Buntz, Brian (3 November 2024). "Quality vs. quantity: US and China chart different paths in global AI patent race in 2024 / Geographical breakdown of AI patents in 2024". R&D World. Archived from the original on 9 December 2024.
  352. ^ a b Russell & Norvig 2021, p. 9.
  353. ^ a b c Copeland, J., ed. (2004). The Essential Turing: the ideas that gave birth to the computer age. Oxford, England: Clarendon Press. ISBN 0-1982-5079-7.
  354. ^ "Google books ngram". Archived from the original on 5 October 2024. Retrieved 5 October 2024.
  355. ^ AI's immediate precursors: McCorduck (2004, pp. 51–107), Crevier (1993, pp. 27–32), Russell & Norvig (2021, pp. 8–17), Moravec (1988, p. 3)
  356. ^ a b Turing's original publication of the Turing test in "Computing machinery and intelligence": Turing (1950) Historical influence and philosophical implications: Haugeland (1985, pp. 6–9), Crevier (1993, p. 24), McCorduck (2004, pp. 70–71), Russell & Norvig (2021, pp. 2, 984)
  357. ^ Crevier (1993), pp. 47–49.
  358. ^ Russell & Norvig (2003), p. 17.
  359. ^ Russell & Norvig (2003), p. 18.
  360. ^ Newquist (1994), pp. 86–86.
  361. ^ Simon (1965, p. 96) quoted in Crevier (1993, p. 109)
  362. ^ Minsky (1967, p. 2) quoted in Crevier (1993, p. 109)
  363. ^ Russell & Norvig (2021), p. 21.
  364. ^ Lighthill (1973).
  365. ^ NRC 1999, pp. 212–213.
  366. ^ Russell & Norvig (2021), p. 22.
  367. ^ Expert systems: Russell & Norvig (2021, pp. 23, 292), Luger & Stubblefield (2004, pp. 227–331), Nilsson (1998, chpt. 17.4), McCorduck (2004, pp. 327–335, 434–435), Crevier (1993, pp. 145–162, 197–203), Newquist (1994, pp. 155–183)
  368. ^ Russell & Norvig (2021), p. 24.
  369. ^ Nilsson (1998), p. 7.
  370. ^ McCorduck (2004), pp. 454–462.
  371. ^ Moravec (1988).
  372. ^ a b Brooks (1990).
  373. ^ Developmental robotics: Weng et al. (2001), Lungarella et al. (2003), Asada et al. (2009), Oudeyer (2010)
  374. ^ Russell & Norvig (2021), p. 25.
  375. ^ Crevier (1993, pp. 214–215), Russell & Norvig (2021, pp. 24, 26)
  376. ^ Russell & Norvig (2021), p. 26.
  377. ^ Formal and narrow methods adopted in the 1990s: Russell & Norvig (2021, pp. 24–26), McCorduck (2004, pp. 486–487)
  378. ^ AI widely used in the late 1990s: Kurzweil (2005, p. 265), NRC (1999, pp. 216–222), Newquist (1994, pp. 189–201)
  379. ^ Wong (2023).
  380. ^ Moore's Law and AI: Russell & Norvig (2021, pp. 14, 27)
  381. ^ a b c Clark (2015b).
  382. ^ Big data: Russell & Norvig (2021, p. 26)
  383. ^ Sagar, Ram (3 June 2020). "OpenAI Releases GPT-3, The Largest Model So Far". Analytics India Magazine. Archived from the original on 4 August 2020. Retrieved 15 March 2023.
  384. ^ Milmo, Dan (2 February 2023). "ChatGPT reaches 100 million users two months after launch". The Guardian. ISSN 0261-3077. Archived from the original on 3 February 2023. Retrieved 31 December 2024.
  385. ^ Gorichanaz, Tim (29 November 2023). "ChatGPT turns 1: AI chatbot's success says as much about humans as technology". The Conversation. Archived from the original on 31 December 2024. Retrieved 31 December 2024.
  386. ^ DiFeliciantonio (2023).
  387. ^ Goswami (2023).
  388. ^ "Nearly 1 in 4 new startups is an AI company". PitchBook. 24 December 2024. Retrieved 3 January 2025.
  389. ^ Grayling, Anthony; Ball, Brian (1 August 2024). "Philosophy is crucial in the age of AI". The Conversation. Archived from the original on 5 October 2024. Retrieved 4 October 2024.
  390. ^ a b Jarow, Oshan (15 June 2024). "Will AI ever become conscious? It depends on how you think about biology". Vox. Archived from the original on 21 September 2024. Retrieved 4 October 2024.
  391. ^ McCarthy, John. "The Philosophy of AI and the AI of Philosophy". jmc.stanford.edu. Archived from the original on 23 October 2018. Retrieved 3 October 2024.
  392. ^ a b Turing (1950), p. 1.
  393. ^ Turing (1950), Under "The Argument from Consciousness".
  394. ^ Kirk-Giannini, Cameron Domenico; Goldstein, Simon (16 October 2023). "AI is closer than ever to passing the Turing test for 'intelligence'. What happens when it does?". The Conversation. Archived from the original on 25 September 2024. Retrieved 17 August 2024.
  395. ^ Russell & Norvig (2021), p. 3.
  396. ^ Maker (2006).
  397. ^ McCarthy (1999).
  398. ^ Minsky (1986).
  399. ^ "What Is Artificial Intelligence (AI)?". Google Cloud Platform. Archived from the original on 31 July 2023. Retrieved 16 October 2023.
  400. ^ "One of the Biggest Problems in Regulating AI Is Agreeing on a Definition". Carnegie Endowment for International Peace. Retrieved 31 July 2024.
  401. ^ "AI or BS? How to tell if a marketing tool really uses artificial intelligence". The Drum. Retrieved 31 July 2024.
  402. ^ Musser, George (1 September 2023). "How AI Knows Things No One Told It". Scientific American. Retrieved 17 July 2025.
  403. ^ Nilsson (1983), p. 10.
  404. ^ Haugeland (1985), pp. 112–117.
  405. ^ Physical symbol system hypothesis: Newell & Simon (1976, p. 116) Historical significance: McCorduck (2004, p. 153), Russell & Norvig (2021, p. 19)
  406. ^ Moravec's paradox: Moravec (1988, pp. 15–16), Minsky (1986, p. 29), Pinker (2007, pp. 190–191)
  407. ^ Dreyfus' critique of AI: Dreyfus (1972), Dreyfus & Dreyfus (1986) Historical significance and philosophical implications: Crevier (1993, pp. 120–132), McCorduck (2004, pp. 211–239), Russell & Norvig (2021, pp. 981–982), Fearn (2007, chpt. 3)
  408. ^ Crevier (1993), p. 125.
  409. ^ Langley (2011).
  410. ^ Katz (2012).
  411. ^ Neats vs. scruffies, the historic debate: McCorduck (2004, pp. 421–424, 486–489), Crevier (1993, p. 168), Nilsson (1983, pp. 10–11), Russell & Norvig (2021, p. 24) A classic example of the "scruffy" approach to intelligence: Minsky (1986) A modern example of neat AI and its aspirations in the 21st century: Domingos (2015)
  412. ^ Pennachin & Goertzel (2007).
  413. ^ a b Roberts (2016).
  414. ^ Russell & Norvig (2021), p. 986.
  415. ^ Chalmers (1995).
  416. ^ Dennett (1991).
  417. ^ Horst (2005).
  418. ^ Searle (1999).
  419. ^ Searle (1980), p. 1.
  420. ^ Russell & Norvig (2021), p. 9817.
  421. ^ Searle's Chinese room argument: Searle (1980). Searle's original presentation of the thought experiment., Searle (1999). Discussion: Russell & Norvig (2021, pp. 985), McCorduck (2004, pp. 443–445), Crevier (1993, pp. 269–271)
  422. ^ Leith, Sam (7 July 2022). "Nick Bostrom: How can we be certain a machine isn't conscious?". The Spectator. Archived from the original on 26 September 2024. Retrieved 23 February 2024.
  423. ^ a b c Thomson, Jonny (31 October 2022). "Why don't robots have rights?". Big Think. Archived from the original on 13 September 2024. Retrieved 23 February 2024.
  424. ^ a b Kateman, Brian (24 July 2023). "AI Should Be Terrified of Humans". Time. Archived from the original on 25 September 2024. Retrieved 23 February 2024.
  425. ^ Wong, Jeff (10 July 2023). "What leaders need to know about robot rights". Fast Company.
  426. ^ Hern, Alex (12 January 2017). "Give robots 'personhood' status, EU committee argues". The Guardian. ISSN 0261-3077. Archived from the original on 5 October 2024. Retrieved 23 February 2024.
  427. ^ Dovey, Dana (14 April 2018). "Experts Don't Think Robots Should Have Rights". Newsweek. Archived from the original on 5 October 2024. Retrieved 23 February 2024.
  428. ^ Cuddy, Alice (13 April 2018). "Robot rights violate human rights, experts warn EU". euronews. Archived from the original on 19 September 2024. Retrieved 23 February 2024.
  429. ^ The Intelligence explosion and technological singularity: Russell & Norvig (2021, pp. 1004–1005), Omohundro (2008), Kurzweil (2005) I. J. Good's "intelligence explosion": Good (1965) Vernor Vinge's "singularity": Vinge (1993)
  430. ^ Russell & Norvig (2021), p. 1005.
  431. ^ Transhumanism: Moravec (1988), Kurzweil (2005), Russell & Norvig (2021, p. 1005)
  432. ^ AI as evolution: Edward Fredkin is quoted in McCorduck (2004, p. 401), Butler (1863), Dyson (1998)
  433. ^ AI in myth: McCorduck (2004, pp. 4–5)
  434. ^ McCorduck (2004), pp. 340–400.
  435. ^ Buttazzo (2001).
  436. ^ Anderson (2008).
  437. ^ McCauley (2007).
  438. ^ Galvan (1997).

AI textbooks

The two most widely used textbooks in 2023 (see the Open Syllabus):

The four most widely used AI textbooks in 2008:

Other textbooks:

History of AI

Other sources

Further reading

  • Autor, David H., "Why Are There Still So Many Jobs? The History and Future of Workplace Automation" (2015) 29(3) Journal of Economic Perspectives 3.
  • Boyle, James, The Line: AI and the Future of Personhood, MIT Press, 2024.
  • Cukier, Kenneth, "Ready for Robots? How to Think about the Future of AI", Foreign Affairs, vol. 98, no. 4 (July/August 2019), pp. 192–198. George Dyson, historian of computing, writes (in what might be called "Dyson's Law") that "Any system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand." (p. 197.) Computer scientist Alex Pentland writes: "Current AI machine-learning algorithms are, at their core, dead simple stupid. They work, but they work by brute force." (p. 198.)
  • Evans, Woody (2015). "Posthuman Rights: Dimensions of Transhuman Worlds". Teknokultura. 12 (2). doi:10.5209/rev_TK.2015.v12.n2.49072. S2CID 147612763.
  • Frank, Michael (22 September 2023). "US Leadership in Artificial Intelligence Can Shape the 21st Century Global Order". The Diplomat. Archived from the original on 16 September 2024. Retrieved 8 December 2023. Instead, the United States has developed a new area of dominance that the rest of the world views with a mixture of awe, envy, and resentment: artificial intelligence... From AI models and research to cloud computing and venture capital, U.S. companies, universities, and research labs – and their affiliates in allied countries – appear to have an enormous lead in both developing cutting-edge AI and commercializing it. The value of U.S. venture capital investments in AI start-ups exceeds that of the rest of the world combined.
  • Gertner, Jon. (2023) "Wikipedia's Moment of Truth: Can the online encyclopedia help teach A.I. chatbots to get their facts right — without destroying itself in the process?" New York Times Magazine (July 18, 2023) online Archived 20 July 2023 at the Wayback Machine
  • Gleick, James, "The Fate of Free Will" (review of Kevin J. Mitchell, Free Agents: How Evolution Gave Us Free Will, Princeton University Press, 2023, 333 pp.), The New York Review of Books, vol. LXXI, no. 1 (18 January 2024), pp. 27–28, 30. "Agency is what distinguishes us from machines. For biological creatures, reason and purpose come from acting in the world and experiencing the consequences. Artificial intelligences – disembodied, strangers to blood, sweat, and tears – have no occasion for that." (p. 30.)
  • Gleick, James, "The Parrot in the Machine" (review of Emily M. Bender and Alex Hanna, The AI Con: How to Fight Big Tech's Hype and Create the Future We Want, Harper, 274 pp.; and James Boyle, The Line: AI and the Future of Personhood, MIT Press, 326 pp.), The New York Review of Books, vol. LXXII, no. 12 (24 July 2025), pp. 43–46. "[C]hatbox 'writing' has a bland, regurgitated quality. Textures are flattened, sharp edges are sanded. No chatbox could ever have said that April is the cruelest month or that fog comes on little cat feet (though they might now, because one of their chief skills is plagiarism). And when synthetically extruded text turns out wrong, it can be comically wrong. When a movie fan asked Google whether a certain actor was in Heat, he received this 'AI Overview': 'No, Angelina Jolie is not in heat.'" (p. 44.)
  • Halpern, Sue, "The Coming Tech Autocracy" (review of Verity Harding, AI Needs You: How We Can Change AI's Future and Save Our Own, Princeton University Press, 274 pp.; Gary Marcus, Taming Silicon Valley: How We Can Ensure That AI Works for Us, MIT Press, 235 pp.; Daniela Rus and Gregory Mone, The Mind's Mirror: Risk and Reward in the Age of AI, Norton, 280 pp.; Madhumita Murgia, Code Dependent: Living in the Shadow of AI, Henry Holt, 311 pp.), The New York Review of Books, vol. LXXI, no. 17 (7 November 2024), pp. 44–46. "'We can't realistically expect that those who hope to get rich from AI are going to have the interests of the rest of us close at heart,' ... writes [Gary Marcus]. 'We can't count on governments driven by campaign finance contributions [from tech companies] to push back.'... Marcus details the demands that citizens should make of their governments and the tech companies. They include transparency on how AI systems work; compensation for individuals if their data [are] used to train LLMs (large language model)s and the right to consent to this use; and the ability to hold tech companies liable for the harms they cause by eliminating Section 230, imposing cash penalties, and passing stricter product liability laws... Marcus also suggests... that a new, AI-specific federal agency, akin to the FDA, the FCC, or the FTC, might provide the most robust oversight.... [T]he Fordham law professor Chinmayi Sharma... suggests... establish[ing] a professional licensing regime for engineers that would function in a similar way to medical licenses, malpractice suits, and the Hippocratic oath in medicine. 'What if, like doctors,' she asks..., 'AI engineers also vowed to do no harm?'" (p. 46.)
  • Henderson, Mark (24 April 2007). "Human rights for robots? We're getting carried away". The Times Online. London. Archived from the original on 31 May 2014. Retrieved 31 May 2014.
  • Hughes-Castleberry, Kenna, "A Murder Mystery Puzzle: The literary puzzle Cain's Jawbone, which has stumped humans for decades, reveals the limitations of natural-language-processing algorithms", Scientific American, vol. 329, no. 4 (November 2023), pp. 81–82. "This murder mystery competition has revealed that although NLP (natural-language processing) models are capable of incredible feats, their abilities are very much limited by the amount of context they receive. This [...] could cause [difficulties] for researchers who hope to use them to do things such as analyze ancient languages. In some cases, there are few historical records on long-gone civilizations to serve as training data for such a purpose." (p. 82.)
  • Immerwahr, Daniel, "Your Lying Eyes: People now use A.I. to generate fake videos indistinguishable from real ones. How much does it matter?", The New Yorker, 20 November 2023, pp. 54–59. "If by 'deepfakes' we mean realistic videos produced using artificial intelligence that actually deceive people, then they barely exist. The fakes aren't deep, and the deeps aren't fake. [...] A.I.-generated videos are not, in general, operating in our media as counterfeited evidence. Their role better resembles that of cartoons, especially smutty ones." (p. 59.)
  • Johnston, John (2008) The Allure of Machinic Life: Cybernetics, Artificial Life, and the New AI, MIT Press.
  • Jumper, John; Evans, Richard; Pritzel, Alexander; et al. (26 August 2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. PMC 8371605. PMID 34265844. S2CID 235959867.
  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (28 May 2015). "Deep learning". Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096. Archived from the original on 5 June 2023. Retrieved 19 June 2023.
  • Leffer, Lauren, "The Risks of Trusting AI: We must avoid humanizing machine-learning models used in scientific research", Scientific American, vol. 330, no. 6 (June 2024), pp. 80–81.
  • Lepore, Jill, "The Chit-Chatbot: Is talking with a machine a conversation?", The New Yorker, 7 October 2024, pp. 12–16.
  • Maschafilm (2010). "Content: Plug & Pray Film – Artificial Intelligence – Robots". plugandpray-film.de. Archived from the original on 12 February 2016.
  • Marcus, Gary, "Artificial Confidence: Even the newest, buzziest systems of artificial general intelligence are stymmied by the same old problems", Scientific American, vol. 327, no. 4 (October 2022), pp. 42–45.
  • Mitchell, Melanie (2019). Artificial intelligence: a guide for thinking humans. New York: Farrar, Straus and Giroux. ISBN 978-0-3742-5783-5.
  • Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; et al. (26 February 2015). "Human-level control through deep reinforcement learning". Nature. 518 (7540): 529–533. Bibcode:2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670. S2CID 205242740. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Introduced DQN, which produced human-level performance on some Atari games.
  • Press, Eyal, "In Front of Their Faces: Does facial-recognition technology lead police to ignore contradictory evidence?", The New Yorker, 20 November 2023, pp. 20–26.
  • "Robots could demand legal rights". BBC News. 21 December 2006. Archived from the original on 15 October 2019. Retrieved 3 February 2011.
  • Roivainen, Eka, "AI's IQ: ChatGPT aced a [standard intelligence] test but showed that intelligence cannot be measured by IQ alone", Scientific American, vol. 329, no. 1 (July/August 2023), p. 7. "Despite its high IQ, ChatGPT fails at tasks that require real humanlike reasoning or an understanding of the physical and social world.... ChatGPT seemed unable to reason logically and tried to rely on its vast database of... facts derived from online texts."
  • Scharre, Paul, "Killer Apps: The Real Dangers of an AI Arms Race", Foreign Affairs, vol. 98, no. 3 (May/June 2019), pp. 135–144. "Today's AI technologies are powerful but unreliable. Rules-based systems cannot deal with circumstances their programmers did not anticipate. Learning systems are limited by the data on which they were trained. AI failures have already led to tragedy. Advanced autopilot features in cars, although they perform well in some circumstances, have driven cars without warning into trucks, concrete barriers, and parked cars. In the wrong situation, AI systems go from supersmart to superdumb in an instant. When an enemy is trying to manipulate and hack an AI system, the risks are even greater." (p. 140.)
  • Schulz, Hannes; Behnke, Sven (1 November 2012). "Deep Learning". KI – Künstliche Intelligenz. 26 (4): 357–363. doi:10.1007/s13218-012-0198-z. ISSN 1610-1987. S2CID 220523562.
  • Serenko, Alexander; Michael Dohan (2011). "Comparing the expert survey and citation impact journal ranking methods: Example from the field of Artificial Intelligence" (PDF). Journal of Informetrics. 5 (4): 629–649. doi:10.1016/j.joi.2011.06.002. Archived (PDF) from the original on 4 October 2013. Retrieved 12 September 2013.
  • Silver, David; Huang, Aja; Maddison, Chris J.; et al. (28 January 2016). "Mastering the game of Go with deep neural networks and tree search". Nature. 529 (7587): 484–489. Bibcode:2016Natur.529..484S. doi:10.1038/nature16961. PMID 26819042. S2CID 515925. Archived from the original on 18 June 2023. Retrieved 19 June 2023.
  • Tarnoff, Ben, "The Labor Theory of AI" (review of Matteo Pasquinelli, The Eye of the Master: A Social History of Artificial Intelligence, Verso, 2024, 264 pp.), The New York Review of Books, vol. LXXII, no. 5 (27 March 2025), pp. 30–32. The reviewer, Ben Tarnoff, writes: "The strangeness at the heart of the generative AI boom is that nobody really knows how the technology works. We know how the large language models within ChatGPT and its counterparts are trained, even if we don't always know which data they're being trained on: they are asked to predict the next string of characters in a sequence. But exactly how they arrive at any given prediction is a mystery. The computations that occur inside the model are simply too intricate for any human to comprehend." (p. 32.)
  • Vaswani, Ashish, Noam Shazeer, Niki Parmar et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). Seminal paper on transformers.
  • Vincent, James, "Horny Robot Baby Voice: James Vincent on AI chatbots", London Review of Books, vol. 46, no. 19 (10 October 2024), pp. 29–32. "[AI chatbot] programs are made possible by new technologies but rely on the timelelss human tendency to anthropomorphise." (p. 29.)
  • White Paper: On Artificial Intelligence – A European approach to excellence and trust (PDF). Brussels: European Commission. 2020. Archived (PDF) from the original on 20 February 2020. Retrieved 20 February 2020.
Wikipedia The Free Encyclopedia Donate Create account Log in Contents (Top) Goals Techniques Applications Ethics History Philosophy Future In fiction See also Explanatory notes References Further reading External links Artificial intelligence Article Talk Read View source View history Tools Appearance Text Small Standard Large Width Standard Wide Color (beta) Automatic Light Dark Page semi-protected From Wikipedia, the free encyclopedia "AI" redirects here. For other uses, see AI (disambiguation) and Artificial intelligence (disambiguation). Part of a series on Artificial intelligence (AI) Major goals Approaches Applications Philosophy History Glossary vte Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.[1] High-profile applications of AI include advanced web search engines (e.g., Google Search); recommendation systems (used by YouTube, Amazon, and Netflix); virtual assistants (e.g., Google Assistant, Siri, and Alexa); autonomous vehicles (e.g., Waymo); generative and creative tools (e.g., language models and AI art); and superhuman play and analysis in strategy games (e.g., chess and Go). However, many AI applications are not perceived as AI: "A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."[2][3] Various subfields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include learning, reasoning, knowledge representation, planning, natural language processing, perception, and support for robotics.[a] To reach these goals, AI researchers have adapted and integrated a wide range of techniques, including search and mathematical optimization, formal logic, artificial neural networks, and methods based on statistics, operations research, and economics.[b] AI also draws upon psychology, linguistics, philosophy, neuroscience, and other fields.[4] Some companies, such as OpenAI, Google DeepMind and Meta,[5] aim to create artificial general intelligence (AGI)—AI that can complete virtually any cognitive task at least as well as a human. Artificial intelligence was founded as an academic discipline in 1956,[6] and the field went through multiple cycles of optimism throughout its history,[7][8] followed by periods of disappointment and loss of funding, known as AI winters.[9][10] Funding and interest vastly increased after 2012 when graphics processing units started being used to accelerate neural networks and deep learning outperformed previous AI techniques.[11] This growth accelerated further after 2017 with the transformer architecture.[12] In the 2020s, an ongoing period of rapid progress in advanced generative AI became known as the AI boom. Generative AI's ability to create and modify content has led to several unintended consequences and harms, which has raised ethical concerns about AI's long-term effects and potential existential risks, prompting discussions about regulatory policies to ensure the safety and benefits of the technology. Goals The general problem of simulating (or creating) intelligence has been broken into subproblems. These consist of particular traits or capabilities that researchers expect an intelligent system to display. The traits described below have received the most attention and cover the scope of AI research.[a] Reasoning and problem-solving Early researchers developed algorithms that imitated step-by-step reasoning that humans use when they solve puzzles or make logical deductions.[13] By the late 1980s and 1990s, methods were developed for dealing with uncertain or incomplete information, employing concepts from probability and economics.[14] Many of these algorithms are insufficient for solving large reasoning problems because they experience a "combinatorial explosion": They become exponentially slower as the problems grow.[15] Even humans rarely use the step-by-step deduction that early AI research could model. They solve most of their problems using fast, intuitive judgments.[16] Accurate and efficient reasoning is an unsolved problem. Knowledge representation An ontology represents knowledge as a set of concepts within a domain and the relationships between those concepts. Knowledge representation and knowledge engineering[17] allow AI programs to answer questions intelligently and make deductions about real-world facts. Formal knowledge representations are used in content-based indexing and retrieval,[18] scene interpretation,[19] clinical decision support,[20] knowledge discovery (mining "interesting" and actionable inferences from large databases),[21] and other areas.[22] A knowledge base is a body of knowledge represented in a form that can be used by a program. An ontology is the set of objects, relations, concepts, and properties used by a particular domain of knowledge.[23] Knowledge bases need to represent things such as objects, properties, categories, and relations between objects;[24] situations, events, states, and time;[25] causes and effects;[26] knowledge about knowledge (what we know about what other people know);[27] default reasoning (things that humans assume are true until they are told differently and will remain true even when other facts are changing);[28] and many other aspects and domains of knowledge. Among the most difficult problems in knowledge representation are the breadth of commonsense knowledge (the set of atomic facts that the average person knows is enormous);[29] and the sub-symbolic form of most commonsense knowledge (much of what people know is not represented as "facts" or "statements" that they could express verbally).[16] There is also the difficulty of knowledge acquisition, the problem of obtaining knowledge for AI applications.[c] Planning and decision-making An "agent" is anything that perceives and takes actions in the world. A rational agent has goals or preferences and takes actions to make them happen.[d][32] In automated planning, the agent has a specific goal.[33] In automated decision-making, the agent has preferences—there are some situations it would prefer to be in, and some situations it is trying to avoid. The decision-making agent assigns a number to each situation (called the "utility") that measures how much the agent prefers it. For each possible action, it can calculate the "expected utility": the utility of all possible outcomes of the action, weighted by the probability that the outcome will occur. It can then choose the action with the maximum expected utility.[34] In classical planning, the agent knows exactly what the effect of any action will be.[35] In most real-world problems, however, the agent may not be certain about the situation they are in (it is "unknown" or "unobservable") and it may not know for certain what will happen after each possible action (it is not "deterministic"). It must choose an action by making a probabilistic guess and then reassess the situation to see if the action worked.[36] In some problems, the agent's preferences may be uncertain, especially if there are other agents or humans involved. These can be learned (e.g., with inverse reinforcement learning), or the agent can seek information to improve its preferences.[37] Information value theory can be used to weigh the value of exploratory or experimental actions.[38] The space of possible future actions and situations is typically intractably large, so the agents must take actions and evaluate situations while being uncertain of what the outcome will be. A Markov decision process has a transition model that describes the probability that a particular action will change the state in a particular way and a reward function that supplies the utility of each state and the cost of each action. A policy associates a decision with each possible state. The policy could be calculated (e.g., by iteration), be heuristic, or it can be learned.[39] Game theory describes the rational behavior of multiple interacting agents and is used in AI programs that make decisions that involve other agents.[40] Learning Machine learning is the study of programs that can improve their performance on a given task automatically.[41] It has been a part of AI from the beginning.[e] In supervised learning, the training data is labelled with the expected answers, while in unsupervised learning, the model identifies patterns or structures in unlabelled data. There are several kinds of machine learning. Unsupervised learning analyzes a stream of data and finds patterns and makes predictions without any other guidance.[44] Supervised learning requires labeling the training data with the expected answers, and comes in two main varieties: classification (where the program must learn to predict what category the input belongs in) and regression (where the program must deduce a numeric function based on numeric input).[45] In reinforcement learning, the agent is rewarded for good responses and punished for bad ones. The agent learns to choose responses that are classified as "good".[46] Transfer learning is when the knowledge gained from one problem is applied to a new problem.[47] Deep learning is a type of machine learning that runs inputs through biologically inspired artificial neural networks for all of these types of learning.[48] Computational learning theory can assess learners by computational complexity, by sample complexity (how much data is required), or by other notions of optimization.[49] Natural language processing Natural language processing (NLP) allows programs to read, write and communicate in human languages.[50] Specific problems include speech recognition, speech synthesis, machine translation, information extraction, information retrieval and question answering.[51] Early work, based on Noam Chomsky's generative grammar and semantic networks, had difficulty with word-sense disambiguation[f] unless restricted to small domains called "micro-worlds" (due to the common sense knowledge problem[29]). Margaret Masterman believed that it was meaning and not grammar that was the key to understanding languages, and that thesauri and not dictionaries should be the basis of computational language structure. Modern deep learning techniques for NLP include word embedding (representing words, typically as vectors encoding their meaning),[52] transformers (a deep learning architecture using an attention mechanism),[53] and others.[54] In 2019, generative pre-trained transformer (or "GPT") language models began to generate coherent text,[55][56] and by 2023, these models were able to get human-level scores on the bar exam, SAT test, GRE test, and many other real-world applications.[57] Perception Machine perception is the ability to use input from sensors (such as cameras, microphones, wireless signals, active lidar, sonar, radar, and tactile sensors) to deduce aspects of the world. Computer vision is the ability to analyze visual input.[58] The field includes speech recognition,[59] image classification,[60] facial recognition, object recognition,[61] object tracking,[62] and robotic perception.[63] Social intelligence Kismet, a robot head which was made in the 1990s; it is a machine that can recognize and simulate emotions.[64] Affective computing is a field that comprises systems that recognize, interpret, process, or simulate human feeling, emotion, and mood.[65] For example, some virtual assistants are programmed to speak conversationally or even to banter humorously; it makes them appear more sensitive to the emotional dynamics of human interaction, or to otherwise facilitate human–computer interaction. However, this tends to give naïve users an unrealistic conception of the intelligence of existing computer agents.[66] Moderate successes related to affective computing include textual sentiment analysis and, more recently, multimodal sentiment analysis, wherein AI classifies the effects displayed by a videotaped subject.[67] General intelligence A machine with artificial general intelligence would be able to solve a wide variety of problems with breadth and versatility similar to human intelligence.[68] Techniques AI research uses a wide variety of techniques to accomplish the goals above.[b] Search and optimization AI can solve many problems by intelligently searching through many possible solutions.[69] There are two very different kinds of search used in AI: state space search and local search. State space search State space search searches through a tree of possible states to try to find a goal state.[70] For example, planning algorithms search through trees of goals and subgoals, attempting to find a path to a target goal, a process called means-ends analysis.[71] Simple exhaustive searches[72] are rarely sufficient for most real-world problems: the search space (the number of places to search) quickly grows to astronomical numbers. The result is a search that is too slow or never completes.[15] "Heuristics" or "rules of thumb" can help prioritize choices that are more likely to reach a goal.[73] Adversarial search is used for game-playing programs, such as chess or Go. It searches through a tree of possible moves and countermoves, looking for a winning position.[74] Local search Illustration of gradient descent for 3 different starting points; two parameters (represented by the plan coordinates) are adjusted in order to minimize the loss function (the height) Local search uses mathematical optimization to find a solution to a problem. It begins with some form of guess and refines it incrementally.[75] Gradient descent is a type of local search that optimizes a set of numerical parameters by incrementally adjusting them to minimize a loss function. Variants of gradient descent are commonly used to train neural networks,[76] through the backpropagation algorithm. Another type of local search is evolutionary computation, which aims to iteratively improve a set of candidate solutions by "mutating" and "recombining" them, selecting only the fittest to survive each generation.[77] Distributed search processes can coordinate via swarm intelligence algorithms. Two popular swarm algorithms used in search are particle swarm optimization (inspired by bird flocking) and ant colony optimization (inspired by ant trails).[78] Logic Formal logic is used for reasoning and knowledge representation.[79] Formal logic comes in two main forms: propositional logic (which operates on statements that are true or false and uses logical connectives such as "and", "or", "not" and "implies")[80] and predicate logic (which also operates on objects, predicates and relations and uses quantifiers such as "Every X is a Y" and "There are some Xs that are Ys").[81] Deductive reasoning in logic is the process of proving a new statement (conclusion) from other statements that are given and assumed to be true (the premises).[82] Proofs can be structured as proof trees, in which nodes are labelled by sentences, and children nodes are connected to parent nodes by inference rules. Given a problem and a set of premises, problem-solving reduces to searching for a proof tree whose root node is labelled by a solution of the problem and whose leaf nodes are labelled by premises or axioms. In the case of Horn clauses, problem-solving search can be performed by reasoning forwards from the premises or backwards from the problem.[83] In the more general case of the clausal form of first-order logic, resolution is a single, axiom-free rule of inference, in which a problem is solved by proving a contradiction from premises that include the negation of the problem to be solved.[84] Inference in both Horn clause logic and first-order logic is undecidable, and therefore intractable. However, backward reasoning with Horn clauses, which underpins computation in the logic programming language Prolog, is Turing complete. Moreover, its efficiency is competitive with computation in other symbolic programming languages.[85] Fuzzy logic assigns a "degree of truth" between 0 and 1. It can therefore handle propositions that are vague and partially true.[86] Non-monotonic logics, including logic programming with negation as failure, are designed to handle default reasoning.[28] Other specialized versions of logic have been developed to describe many complex domains. Probabilistic methods for uncertain reasoning A simple Bayesian network, with the associated conditional probability tables Many problems in AI (including reasoning, planning, learning, perception, and robotics) require the agent to operate with incomplete or uncertain information. AI researchers have devised a number of tools to solve these problems using methods from probability theory and economics.[87] Precise mathematical tools have been developed that analyze how an agent can make choices and plan, using decision theory, decision analysis,[88] and information value theory.[89] These tools include models such as Markov decision processes,[90] dynamic decision networks,[91] game theory and mechanism design.[92] Bayesian networks[93] are a tool that can be used for reasoning (using the Bayesian inference algorithm),[g][95] learning (using the expectation–maximization algorithm),[h][97] planning (using decision networks)[98] and perception (using dynamic Bayesian networks).[91] Probabilistic algorithms can also be used for filtering, prediction, smoothing, and finding explanations for streams of data, thus helping perception systems analyze processes that occur over time (e.g., hidden Markov models or Kalman filters).[91] Expectation–maximization clustering of Old Faithful eruption data starts from a random guess but then successfully converges on an accurate clustering of the two physically distinct modes of eruption. Classifiers and statistical learning methods The simplest AI applications can be divided into two types: classifiers (e.g., "if shiny then diamond"), on one hand, and controllers (e.g., "if diamond then pick up"), on the other hand. Classifiers[99] are functions that use pattern matching to determine the closest match. They can be fine-tuned based on chosen examples using supervised learning. Each pattern (also called an "observation") is labeled with a certain predefined class. All the observations combined with their class labels are known as a data set. When a new observation is received, that observation is classified based on previous experience.[45] There are many kinds of classifiers in use.[100] The decision tree is the simplest and most widely used symbolic machine learning algorithm.[101] K-nearest neighbor algorithm was the most widely used analogical AI until the mid-1990s, and Kernel methods such as the support vector machine (SVM) displaced k-nearest neighbor in the 1990s.[102] The naive Bayes classifier is reportedly the "most widely used learner"[103] at Google, due in part to its scalability.[104] Neural networks are also used as classifiers.[105] Artificial neural networks A neural network is an interconnected group of nodes, akin to the vast network of neurons in the human brain. An artificial neural network is based on a collection of nodes also known as artificial neurons, which loosely model the neurons in a biological brain. It is trained to recognise patterns; once trained, it can recognise those patterns in fresh data. There is an input, at least one hidden layer of nodes and an output. Each node applies a function and once the weight crosses its specified threshold, the data is transmitted to the next layer. A network is typically called a deep neural network if it has at least 2 hidden layers.[105] Learning algorithms for neural networks use local search to choose the weights that will get the right output for each input during training. The most common training technique is the backpropagation algorithm.[106] Neural networks learn to model complex relationships between inputs and outputs and find patterns in data. In theory, a neural network can learn any function.[107] In feedforward neural networks the signal passes in only one direction.[108] The term perceptron typically refers to a single-layer neural network.[109] In contrast, deep learning uses many layers.[110] Recurrent neural networks (RNNs) feed the output signal back into the input, which allows short-term memories of previous input events. Long short-term memory networks (LSTMs) are recurrent neural networks that better preserve longterm dependencies and are less sensitive to the vanishing gradient problem.[111] Convolutional neural networks (CNNs) use layers of kernels to more efficiently process local patterns. This local processing is especially important in image processing, where the early CNN layers typically identify simple local patterns such as edges and curves, with subsequent layers detecting more complex patterns like textures, and eventually whole objects.[112] Deep learning Deep learning is a subset of machine learning, which is itself a subset of artificial intelligence.[113] Deep learning uses several layers of neurons between the network's inputs and outputs.[110] The multiple layers can progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits, letters, or faces.[114] Deep learning has profoundly improved the performance of programs in many important subfields of artificial intelligence, including computer vision, speech recognition, natural language processing, image classification,[115] and others. The reason that deep learning performs so well in so many applications is not known as of 2021.[116] The sudden success of deep learning in 2012–2015 did not occur because of some new discovery or theoretical breakthrough (deep neural networks and backpropagation had been described by many people, as far back as the 1950s)[i] but because of two factors: the incredible increase in computer power (including the hundred-fold increase in speed by switching to GPUs) and the availability of vast amounts of training data, especially the giant curated datasets used for benchmark testing, such as ImageNet.[j] GPT Generative pre-trained transformers (GPT) are large language models (LLMs) that generate text based on the semantic relationships between words in sentences. Text-based GPT models are pre-trained on a large corpus of text that can be from the Internet. The pretraining consists of predicting the next token (a token being usually a word, subword, or punctuation). Throughout this pretraining, GPT models accumulate knowledge about the world and can then generate human-like text by repeatedly predicting the next token. Typically, a subsequent training phase makes the model more truthful, useful, and harmless, usually with a technique called reinforcement learning from human feedback (RLHF). Current GPT models are prone to generating falsehoods called "hallucinations". These can be reduced with RLHF and quality data, but the problem has been getting worse for reasoning systems.[124] Such systems are used in chatbots, which allow people to ask a question or request a task in simple text.[125][126] Current models and services include ChatGPT, Claude, Gemini, Copilot, and Meta AI.[127] Multimodal GPT models can process different types of data (modalities) such as images, videos, sound, and text.[128] Hardware and software Main articles: Programming languages for artificial intelligence and Hardware for artificial intelligence In the late 2010s, graphics processing units (GPUs) that were increasingly designed with AI-specific enhancements and used with specialized TensorFlow software had replaced previously used central processing unit (CPUs) as the dominant means for large-scale (commercial and academic) machine learning models' training.[129] Specialized programming languages such as Prolog were used in early AI research,[130] but general-purpose programming languages like Python have become predominant.[131] The transistor density in integrated circuits has been observed to roughly double every 18 months—a trend known as Moore's law, named after the Intel co-founder Gordon Moore, who first identified it. Improvements in GPUs have been even faster,[132] a trend sometimes called Huang's law,[133] named after Nvidia co-founder and CEO Jensen Huang. Applications Main article: Applications of artificial intelligence AI and machine learning technology is used in most of the essential applications of the 2020s, including: search engines (such as Google Search), targeting online advertisements, recommendation systems (offered by Netflix, YouTube or Amazon), driving internet traffic, targeted advertising (AdSense, Facebook), virtual assistants (such as Siri or Alexa), autonomous vehicles (including drones, ADAS and self-driving cars), automatic language translation (Microsoft Translator, Google Translate), facial recognition (Apple's FaceID or Microsoft's DeepFace and Google's FaceNet) and image labeling (used by Facebook, Apple's Photos and TikTok). The deployment of AI may be overseen by a chief automation officer (CAO). Health and medicine Main article: Artificial intelligence in healthcare The application of AI in medicine and medical research has the potential to increase patient care and quality of life.[134] Through the lens of the Hippocratic Oath, medical professionals are ethically compelled to use AI, if applications can more accurately diagnose and treat patients.[135][136] For medical research, AI is an important tool for processing and integrating big data. This is particularly important for organoid and tissue engineering development which use microscopy imaging as a key technique in fabrication.[137] It has been suggested that AI can overcome discrepancies in funding allocated to different fields of research.[137][138] New AI tools can deepen the understanding of biomedically relevant pathways. For example, AlphaFold 2 (2021) demonstrated the ability to approximate, in hours rather than months, the 3D structure of a protein.[139] In 2023, it was reported that AI-guided drug discovery helped find a class of antibiotics capable of killing two different types of drug-resistant bacteria.[140] In 2024, researchers used machine learning to accelerate the search for Parkinson's disease drug treatments. Their aim was to identify compounds that block the clumping, or aggregation, of alpha-synuclein (the protein that characterises Parkinson's disease). They were able to speed up the initial screening process ten-fold and reduce the cost by a thousand-fold.[141][142] Games Main article: Artificial intelligence in video games Game playing programs have been used since the 1950s to demonstrate and test AI's most advanced techniques.[143] Deep Blue became the first computer chess-playing system to beat a reigning world chess champion, Garry Kasparov, on 11 May 1997.[144] In 2011, in a Jeopardy! quiz show exhibition match, IBM's question answering system, Watson, defeated the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin.[145] In March 2016, AlphaGo won 4 out of 5 games of Go in a match with Go champion Lee Sedol, becoming the first computer Go-playing system to beat a professional Go player without handicaps. Then, in 2017, it defeated Ke Jie, who was the best Go player in the world.[146] Other programs handle imperfect-information games, such as the poker-playing program Pluribus.[147] DeepMind developed increasingly generalistic reinforcement learning models, such as with MuZero, which could be trained to play chess, Go, or Atari games.[148] In 2019, DeepMind's AlphaStar achieved grandmaster level in StarCraft II, a particularly challenging real-time strategy game that involves incomplete knowledge of what happens on the map.[149] In 2021, an AI agent competed in a PlayStation Gran Turismo competition, winning against four of the world's best Gran Turismo drivers using deep reinforcement learning.[150] In 2024, Google DeepMind introduced SIMA, a type of AI capable of autonomously playing nine previously unseen open-world video games by observing screen output, as well as executing short, specific tasks in response to natural language instructions.[151] Mathematics Large language models, such as GPT-4, Gemini, Claude, Llama or Mistral, are increasingly used in mathematics. These probabilistic models are versatile, but can also produce wrong answers in the form of hallucinations. They sometimes need a large database of mathematical problems to learn from, but also methods such as supervised fine-tuning[152] or trained classifiers with human-annotated data to improve answers for new problems and learn from corrections.[153] A February 2024 study showed that the performance of some language models for reasoning capabilities in solving math problems not included in their training data was low, even for problems with only minor deviations from trained data.[154] One technique to improve their performance involves training the models to produce correct reasoning steps, rather than just the correct result.[155] The Alibaba Group developed a version of its Qwen models called Qwen2-Math, that achieved state-of-the-art performance on several mathematical benchmarks, including 84% accuracy on the MATH dataset of competition mathematics problems.[156] In January 2025, Microsoft proposed the technique rStar-Math that leverages Monte Carlo tree search and step-by-step reasoning, enabling a relatively small language model like Qwen-7B to solve 53% of the AIME 2024 and 90% of the MATH benchmark problems.[157] Alternatively, dedicated models for mathematical problem solving with higher precision for the outcome including proof of theorems have been developed such as AlphaTensor, AlphaGeometry, AlphaProof and AlphaEvolve[158] all from Google DeepMind,[159] Llemma from EleutherAI[160] or Julius.[161] When natural language is used to describe mathematical problems, converters can transform such prompts into a formal language such as Lean to define mathematical tasks. The experimental model Gemini Deep Think accepts natural language prompts directly and achieved gold medal results in the International Math Olympiad of 2025.[162] Some models have been developed to solve challenging problems and reach good results in benchmark tests, others to serve as educational tools in mathematics.[163] Topological deep learning integrates various topological approaches. Finance Finance is one of the fastest growing sectors where applied AI tools are being deployed: from retail online banking to investment advice and insurance, where automated "robot advisers" have been in use for some years.[164] According to Nicolas Firzli, director of the World Pensions & Investments Forum, it may be too early to see the emergence of highly innovative AI-informed financial products and services. He argues that "the deployment of AI tools will simply further automatise things: destroying tens of thousands of jobs in banking, financial planning, and pension advice in the process, but I'm not sure it will unleash a new wave of [e.g., sophisticated] pension innovation."[165] Military Main article: Military applications of artificial intelligence Various countries are deploying AI military applications.[166] The main applications enhance command and control, communications, sensors, integration and interoperability.[167] Research is targeting intelligence collection and analysis, logistics, cyber operations, information operations, and semiautonomous and autonomous vehicles.[166] AI technologies enable coordination of sensors and effectors, threat detection and identification, marking of enemy positions, target acquisition, coordination and deconfliction of distributed Joint Fires between networked combat vehicles, both human-operated and autonomous.[167] AI has been used in military operations in Iraq, Syria, Israel and Ukraine.[166][168][169][170] Generative AI Vincent van Gogh in watercolour created by generative AI software These paragraphs are an excerpt from Generative artificial intelligence.[edit] Generative artificial intelligence (Generative AI, GenAI,[171] or GAI) is a subfield of artificial intelligence that uses generative models to produce text, images, videos, or other forms of data.[172][173][174] These models learn the underlying patterns and structures of their training data and use them to produce new data[175][176] based on the input, which often comes in the form of natural language prompts.[177][178] Generative AI tools have become more common since the AI boom in the 2020s. This boom was made possible by improvements in transformer-based deep neural networks, particularly large language models (LLMs). Major tools include chatbots such as ChatGPT, Copilot, Gemini, Claude, Grok, and DeepSeek; text-to-image models such as Stable Diffusion, Midjourney, and DALL-E; and text-to-video models such as Veo, LTXV and Sora.[179][180][181][182][183] Technology companies developing generative AI include OpenAI, Anthropic, Meta AI, Microsoft, Google, DeepSeek, and Baidu.[177][184][185] Generative AI has raised many ethical questions and governance challenges as it can be used for cybercrime, or to deceive or manipulate people through fake news or deepfakes.[186][187] Even if used ethically, it may lead to mass replacement of human jobs.[188] The tools themselves have been criticized as violating intellectual property laws, since they are trained on copyrighted works.[189] Agents Main article: Agentic AI AI agents are software entities designed to perceive their environment, make decisions, and take actions autonomously to achieve specific goals. These agents can interact with users, their environment, or other agents. AI agents are used in various applications, including virtual assistants, chatbots, autonomous vehicles, game-playing systems, and industrial robotics. AI agents operate within the constraints of their programming, available computational resources, and hardware limitations. This means they are restricted to performing tasks within their defined scope and have finite memory and processing capabilities. In real-world applications, AI agents often face time constraints for decision-making and action execution. Many AI agents incorporate learning algorithms, enabling them to improve their performance over time through experience or training. Using machine learning, AI agents can adapt to new situations and optimise their behaviour for their designated tasks.[190][191][192] Sexuality Applications of AI in this domain include AI-enabled menstruation and fertility trackers that analyze user data to offer predictions,[193] AI-integrated sex toys (e.g., teledildonics),[194] AI-generated sexual education content,[195] and AI agents that simulate sexual and romantic partners (e.g., Replika).[196] AI is also used for the production of non-consensual deepfake pornography, raising significant ethical and legal concerns.[197] AI technologies have also been used to attempt to identify online gender-based violence and online sexual grooming of minors.[198][199] Other industry-specific tasks There are also thousands of successful AI applications used to solve specific problems for specific industries or institutions. In a 2017 survey, one in five companies reported having incorporated "AI" in some offerings or processes.[200] A few examples are energy storage, medical diagnosis, military logistics, applications that predict the result of judicial decisions, foreign policy, or supply chain management. AI applications for evacuation and disaster management are growing. AI has been used to investigate patterns in large-scale and small-scale evacuations using historical data from GPS, videos or social media. Furthermore, AI can provide real-time information on the evacuation conditions.[201][202][203] In agriculture, AI has helped farmers to increase yield and identify areas that need irrigation, fertilization, pesticide treatments. Agronomists use AI to conduct research and development. AI has been used to predict the ripening time for crops such as tomatoes, monitor soil moisture, operate agricultural robots, conduct predictive analytics, classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and save water. Artificial intelligence is used in astronomy to analyze increasing amounts of available data and applications, mainly for "classification, regression, clustering, forecasting, generation, discovery, and the development of new scientific insights." For example, it is used for discovering exoplanets, forecasting solar activity, and distinguishing between signals and instrumental effects in gravitational wave astronomy. Additionally, it could be used for activities in space, such as space exploration, including the analysis of data from space missions, real-time science decisions of spacecraft, space debris avoidance, and more autonomous operation. During the 2024 Indian elections, US$50 million was spent on authorized AI-generated content, notably by creating deepfakes of allied (including sometimes deceased) politicians to better engage with voters, and by translating speeches to various local languages.[204] Ethics Main article: Ethics of artificial intelligence Street art in Tel Aviv[205][206] AI has potential benefits and potential risks.[207] AI may be able to advance science and find solutions for serious problems: Demis Hassabis of DeepMind hopes to "solve intelligence, and then use that to solve everything else".[208] However, as the use of AI has become widespread, several unintended consequences and risks have been identified.[209][210] In-production systems can sometimes not factor ethics and bias into their AI training processes, especially when the AI algorithms are inherently unexplainable in deep learning.[211] Risks and harm Privacy and copyright Further information: Information privacy and Artificial intelligence and copyright Machine learning algorithms require large amounts of data. The techniques used to acquire this data have raised concerns about privacy, surveillance and copyright. AI-powered devices and services, such as virtual assistants and IoT products, continuously collect personal information, raising concerns about intrusive data gathering and unauthorized access by third parties. The loss of privacy is further exacerbated by AI's ability to process and combine vast amounts of data, potentially leading to a surveillance society where individual activities are constantly monitored and analyzed without adequate safeguards or transparency. Sensitive user data collected may include online activity records, geolocation data, video, or audio.[212] For example, in order to build speech recognition algorithms, Amazon has recorded millions of private conversations and allowed temporary workers to listen to and transcribe some of them.[213] Opinions about this widespread surveillance range from those who see it as a necessary evil to those for whom it is clearly unethical and a violation of the right to privacy.[214] AI developers argue that this is the only way to deliver valuable applications and have developed several techniques that attempt to preserve privacy while still obtaining the data, such as data aggregation, de-identification and differential privacy.[215] Since 2016, some privacy experts, such as Cynthia Dwork, have begun to view privacy in terms of fairness. Brian Christian wrote that experts have pivoted "from the question of 'what they know' to the question of 'what they're doing with it'."[216] Generative AI is often trained on unlicensed copyrighted works, including in domains such as images or computer code; the output is then used under the rationale of "fair use". Experts disagree about how well and under what circumstances this rationale will hold up in courts of law; relevant factors may include "the purpose and character of the use of the copyrighted work" and "the effect upon the potential market for the copyrighted work".[217][218] Website owners who do not wish to have their content scraped can indicate it in a "robots.txt" file.[219] In 2023, leading authors (including John Grisham and Jonathan Franzen) sued AI companies for using their work to train generative AI.[220][221] Another discussed approach is to envision a separate sui generis system of protection for creations generated by AI to ensure fair attribution and compensation for human authors.[222] Dominance by tech giants The commercial AI scene is dominated by Big Tech companies such as Alphabet Inc., Amazon, Apple Inc., Meta Platforms, and Microsoft.[223][224][225] Some of these players already own the vast majority of existing cloud infrastructure and computing power from data centers, allowing them to entrench further in the marketplace.[226][227] Power needs and environmental impacts See also: Environmental impacts of artificial intelligence In January 2024, the International Energy Agency (IEA) released Electricity 2024, Analysis and Forecast to 2026, forecasting electric power use.[228] This is the first IEA report to make projections for data centers and power consumption for artificial intelligence and cryptocurrency. The report states that power demand for these uses might double by 2026, with additional electric power usage equal to electricity used by the whole Japanese nation.[229] Prodigious power consumption by AI is responsible for the growth of fossil fuel use, and might delay closings of obsolete, carbon-emitting coal energy facilities. There is a feverish rise in the construction of data centers throughout the US, making large technology firms (e.g., Microsoft, Meta, Google, Amazon) into voracious consumers of electric power. Projected electric consumption is so immense that there is concern that it will be fulfilled no matter the source. A ChatGPT search involves the use of 10 times the electrical energy as a Google search. The large firms are in haste to find power sources – from nuclear energy to geothermal to fusion. The tech firms argue that – in the long view – AI will be eventually kinder to the environment, but they need the energy now. AI makes the power grid more efficient and "intelligent", will assist in the growth of nuclear power, and track overall carbon emissions, according to technology firms.[230] A 2024 Goldman Sachs Research Paper, AI Data Centers and the Coming US Power Demand Surge, found "US power demand (is) likely to experience growth not seen in a generation...." and forecasts that, by 2030, US data centers will consume 8% of US power, as opposed to 3% in 2022, presaging growth for the electrical power generation industry by a variety of means.[231] Data centers' need for more and more electrical power is such that they might max out the electrical grid. The Big Tech companies counter that AI can be used to maximize the utilization of the grid by all.[232] In 2024, the Wall Street Journal reported that big AI companies have begun negotiations with the US nuclear power providers to provide electricity to the data centers. In March 2024 Amazon purchased a Pennsylvania nuclear-powered data center for US$650 million.[233] Nvidia CEO Jensen Huang said nuclear power is a good option for the data centers.[234] In September 2024, Microsoft announced an agreement with Constellation Energy to re-open the Three Mile Island nuclear power plant to provide Microsoft with 100% of all electric power produced by the plant for 20 years. Reopening the plant, which suffered a partial nuclear meltdown of its Unit 2 reactor in 1979, will require Constellation to get through strict regulatory processes which will include extensive safety scrutiny from the US Nuclear Regulatory Commission. If approved (this will be the first ever US re-commissioning of a nuclear plant), over 835 megawatts of power – enough for 800,000 homes – of energy will be produced. The cost for re-opening and upgrading is estimated at US$1.6 billion and is dependent on tax breaks for nuclear power contained in the 2022 US Inflation Reduction Act.[235] The US government and the state of Michigan are investing almost US$2 billion to reopen the Palisades Nuclear reactor on Lake Michigan. Closed since 2022, the plant is planned to be reopened in October 2025. The Three Mile Island facility will be renamed the Crane Clean Energy Center after Chris Crane, a nuclear proponent and former CEO of Exelon who was responsible for Exelon's spinoff of Constellation.[236] After the last approval in September 2023, Taiwan suspended the approval of data centers north of Taoyuan with a capacity of more than 5 MW in 2024, due to power supply shortages.[237] Taiwan aims to phase out nuclear power by 2025.[237] On the other hand, Singapore imposed a ban on the opening of data centers in 2019 due to electric power, but in 2022, lifted this ban.[237] Although most nuclear plants in Japan have been shut down after the 2011 Fukushima nuclear accident, according to an October 2024 Bloomberg article in Japanese, cloud gaming services company Ubitus, in which Nvidia has a stake, is looking for land in Japan near nuclear power plant for a new data center for generative AI.[238] Ubitus CEO Wesley Kuo said nuclear power plants are the most efficient, cheap and stable power for AI.[238] On 1 November 2024, the Federal Energy Regulatory Commission (FERC) rejected an application submitted by Talen Energy for approval to supply some electricity from the nuclear power station Susquehanna to Amazon's data center.[239] According to the Commission Chairman Willie L. Phillips, it is a burden on the electricity grid as well as a significant cost shifting concern to households and other business sectors.[239] In 2025, a report prepared by the International Energy Agency estimated the greenhouse gas emissions from the energy consumption of AI at 180 million tons. By 2035, these emissions could rise to 300–500 million tonnes depending on what measures will be taken. This is below 1.5% of the energy sector emissions. The emissions reduction potential of AI was estimated at 5% of the energy sector emissions, but rebound effects (for example if people switch from public transport to autonomous cars) can reduce it.[240] Misinformation See also: YouTube § Moderation and offensive content YouTube, Facebook and others use recommender systems to guide users to more content. These AI programs were given the goal of maximizing user engagement (that is, the only goal was to keep people watching). The AI learned that users tended to choose misinformation, conspiracy theories, and extreme partisan content, and, to keep them watching, the AI recommended more of it. Users also tended to watch more content on the same subject, so the AI led people into filter bubbles where they received multiple versions of the same misinformation.[241] This convinced many users that the misinformation was true, and ultimately undermined trust in institutions, the media and the government.[242] The AI program had correctly learned to maximize its goal, but the result was harmful to society. After the U.S. election in 2016, major technology companies took some steps to mitigate the problem.[243] In the early 2020s, generative AI began to create images, audio, and texts that are virtually indistinguishable from real photographs, recordings, or human writing,[244] while realistic AI-generated videos became feasible in the mid-2020s.[245][246][247] It is possible for bad actors to use this technology to create massive amounts of misinformation or propaganda;[248] one such potential malicious use is deepfakes for computational propaganda.[249] AI pioneer Geoffrey Hinton expressed concern about AI enabling "authoritarian leaders to manipulate their electorates" on a large scale, among other risks.[250] AI researchers at Microsoft, OpenAI, universities and other organisations have suggested using "personhood credentials" as a way to overcome online deception enabled by AI models.[251] Algorithmic bias and fairness Main articles: Algorithmic bias and Fairness (machine learning) Machine learning applications will be biased[k] if they learn from biased data.[253] The developers may not be aware that the bias exists.[254] Bias can be introduced by the way training data is selected and by the way a model is deployed.[255][253] If a biased algorithm is used to make decisions that can seriously harm people (as it can in medicine, finance, recruitment, housing or policing) then the algorithm may cause discrimination.[256] The field of fairness studies how to prevent harms from algorithmic biases. On June 28, 2015, Google Photos's new image labeling feature mistakenly identified Jacky Alcine and a friend as "gorillas" because they were black. The system was trained on a dataset that contained very few images of black people,[257] a problem called "sample size disparity".[258] Google "fixed" this problem by preventing the system from labelling anything as a "gorilla". Eight years later, in 2023, Google Photos still could not identify a gorilla, and neither could similar products from Apple, Facebook, Microsoft and Amazon.[259] COMPAS is a commercial program widely used by U.S. courts to assess the likelihood of a defendant becoming a recidivist. In 2016, Julia Angwin at ProPublica discovered that COMPAS exhibited racial bias, despite the fact that the program was not told the races of the defendants. Although the error rate for both whites and blacks was calibrated equal at exactly 61%, the errors for each race were different—the system consistently overestimated the chance that a black person would re-offend and would underestimate the chance that a white person would not re-offend.[260] In 2017, several researchers[l] showed that it was mathematically impossible for COMPAS to accommodate all possible measures of fairness when the base rates of re-offense were different for whites and blacks in the data.[262] A program can make biased decisions even if the data does not explicitly mention a problematic feature (such as "race" or "gender"). The feature will correlate with other features (like "address", "shopping history" or "first name"), and the program will make the same decisions based on these features as it would on "race" or "gender".[263] Moritz Hardt said "the most robust fact in this research area is that fairness through blindness doesn't work."[264] Criticism of COMPAS highlighted that machine learning models are designed to make "predictions" that are only valid if we assume that the future will resemble the past. If they are trained on data that includes the results of racist decisions in the past, machine learning models must predict that racist decisions will be made in the future. If an application then uses these predictions as recommendations, some of these "recommendations" will likely be racist.[265] Thus, machine learning is not well suited to help make decisions in areas where there is hope that the future will be better than the past. It is descriptive rather than prescriptive.[m] Bias and unfairness may go undetected because the developers are overwhelmingly white and male: among AI engineers, about 4% are black and 20% are women.[258] There are various conflicting definitions and mathematical models of fairness. These notions depend on ethical assumptions, and are influenced by beliefs about society. One broad category is distributive fairness, which focuses on the outcomes, often identifying groups and seeking to compensate for statistical disparities. Representational fairness tries to ensure that AI systems do not reinforce negative stereotypes or render certain groups invisible. Procedural fairness focuses on the decision process rather than the outcome. The most relevant notions of fairness may depend on the context, notably the type of AI application and the stakeholders. The subjectivity in the notions of bias and fairness makes it difficult for companies to operationalize them. Having access to sensitive attributes such as race or gender is also considered by many AI ethicists to be necessary in order to compensate for biases, but it may conflict with anti-discrimination laws.[252] At its 2022 Conference on Fairness, Accountability, and Transparency (ACM FAccT 2022), the Association for Computing Machinery, in Seoul, South Korea, presented and published findings that recommend that until AI and robotics systems are demonstrated to be free of bias mistakes, they are unsafe, and the use of self-learning neural networks trained on vast, unregulated sources of flawed internet data should be curtailed.[dubious – discuss][267] Lack of transparency See also: Explainable AI, Algorithmic transparency, and Right to explanation Many AI systems are so complex that their designers cannot explain how they reach their decisions.[268] Particularly with deep neural networks, in which there are many non-linear relationships between inputs and outputs. But some popular explainability techniques exist.[269] It is impossible to be certain that a program is operating correctly if no one knows how exactly it works. There have been many cases where a machine learning program passed rigorous tests, but nevertheless learned something different than what the programmers intended. For example, a system that could identify skin diseases better than medical professionals was found to actually have a strong tendency to classify images with a ruler as "cancerous", because pictures of malignancies typically include a ruler to show the scale.[270] Another machine learning system designed to help effectively allocate medical resources was found to classify patients with asthma as being at "low risk" of dying from pneumonia. Having asthma is actually a severe risk factor, but since the patients having asthma would usually get much more medical care, they were relatively unlikely to die according to the training data. The correlation between asthma and low risk of dying from pneumonia was real, but misleading.[271] People who have been harmed by an algorithm's decision have a right to an explanation.[272] Doctors, for example, are expected to clearly and completely explain to their colleagues the reasoning behind any decision they make. Early drafts of the European Union's General Data Protection Regulation in 2016 included an explicit statement that this right exists.[n] Industry experts noted that this is an unsolved problem with no solution in sight. Regulators argued that nevertheless the harm is real: if the problem has no solution, the tools should not be used.[273] DARPA established the XAI ("Explainable Artificial Intelligence") program in 2014 to try to solve these problems.[274] Several approaches aim to address the transparency problem. SHAP enables to visualise the contribution of each feature to the output.[275] LIME can locally approximate a model's outputs with a simpler, interpretable model.[276] Multitask learning provides a large number of outputs in addition to the target classification. These other outputs can help developers deduce what the network has learned.[277] Deconvolution, DeepDream and other generative methods can allow developers to see what different layers of a deep network for computer vision have learned, and produce output that can suggest what the network is learning.[278] For generative pre-trained transformers, Anthropic developed a technique based on dictionary learning that associates patterns of neuron activations with human-understandable concepts.[279] Bad actors and weaponized AI Main articles: Lethal autonomous weapon, Artificial intelligence arms race, and AI safety Artificial intelligence provides a number of tools that are useful to bad actors, such as authoritarian governments, terrorists, criminals or rogue states. A lethal autonomous weapon is a machine that locates, selects and engages human targets without human supervision.[o] Widely available AI tools can be used by bad actors to develop inexpensive autonomous weapons and, if produced at scale, they are potentially weapons of mass destruction.[281] Even when used in conventional warfare, they currently cannot reliably choose targets and could potentially kill an innocent person.[281] In 2014, 30 nations (including China) supported a ban on autonomous weapons under the United Nations' Convention on Certain Conventional Weapons, however the United States and others disagreed.[282] By 2015, over fifty countries were reported to be researching battlefield robots.[283] AI tools make it easier for authoritarian governments to efficiently control their citizens in several ways. Face and voice recognition allow widespread surveillance. Machine learning, operating this data, can classify potential enemies of the state and prevent them from hiding. Recommendation systems can precisely target propaganda and misinformation for maximum effect. Deepfakes and generative AI aid in producing misinformation. Advanced AI can make authoritarian centralized decision-making more competitive than liberal and decentralized systems such as markets. It lowers the cost and difficulty of digital warfare and advanced spyware.[284] All these technologies have been available since 2020 or earlier—AI facial recognition systems are already being used for mass surveillance in China.[285][286] There are many other ways in which AI is expected to help bad actors, some of which can not be foreseen. For example, machine-learning AI is able to design tens of thousands of toxic molecules in a matter of hours.[287] Technological unemployment Main articles: Workplace impact of artificial intelligence and Technological unemployment Economists have frequently highlighted the risks of redundancies from AI, and speculated about unemployment if there is no adequate social policy for full employment.[288] In the past, technology has tended to increase rather than reduce total employment, but economists acknowledge that "we're in uncharted territory" with AI.[289] A survey of economists showed disagreement about whether the increasing use of robots and AI will cause a substantial increase in long-term unemployment, but they generally agree that it could be a net benefit if productivity gains are redistributed.[290] Risk estimates vary; for example, in the 2010s, Michael Osborne and Carl Benedikt Frey estimated 47% of U.S. jobs are at "high risk" of potential automation, while an OECD report classified only 9% of U.S. jobs as "high risk".[p][292] The methodology of speculating about future employment levels has been criticised as lacking evidential foundation, and for implying that technology, rather than social policy, creates unemployment, as opposed to redundancies.[288] In April 2023, it was reported that 70% of the jobs for Chinese video game illustrators had been eliminated by generative artificial intelligence.[293][294] Unlike previous waves of automation, many middle-class jobs may be eliminated by artificial intelligence; The Economist stated in 2015 that "the worry that AI could do to white-collar jobs what steam power did to blue-collar ones during the Industrial Revolution" is "worth taking seriously".[295] Jobs at extreme risk range from paralegals to fast food cooks, while job demand is likely to increase for care-related professions ranging from personal healthcare to the clergy.[296] From the early days of the development of artificial intelligence, there have been arguments, for example, those put forward by Joseph Weizenbaum, about whether tasks that can be done by computers actually should be done by them, given the difference between computers and humans, and between quantitative calculation and qualitative, value-based judgement.[297] Existential risk Main article: Existential risk from artificial intelligence It has been argued AI will become so powerful that humanity may irreversibly lose control of it. This could, as physicist Stephen Hawking stated, "spell the end of the human race".[298] This scenario has been common in science fiction, when a computer or robot suddenly develops a human-like "self-awareness" (or "sentience" or "consciousness") and becomes a malevolent character.[q] These sci-fi scenarios are misleading in several ways. First, AI does not require human-like sentience to be an existential risk. Modern AI programs are given specific goals and use learning and intelligence to achieve them. Philosopher Nick Bostrom argued that if one gives almost any goal to a sufficiently powerful AI, it may choose to destroy humanity to achieve it (he used the example of a paperclip maximizer).[300] Stuart Russell gives the example of household robot that tries to find a way to kill its owner to prevent it from being unplugged, reasoning that "you can't fetch the coffee if you're dead."[301] In order to be safe for humanity, a superintelligence would have to be genuinely aligned with humanity's morality and values so that it is "fundamentally on our side".[302] Second, Yuval Noah Harari argues that AI does not require a robot body or physical control to pose an existential risk. The essential parts of civilization are not physical. Things like ideologies, law, government, money and the economy are built on language; they exist because there are stories that billions of people believe. The current prevalence of misinformation suggests that an AI could use language to convince people to believe anything, even to take actions that are destructive.[303] The opinions amongst experts and industry insiders are mixed, with sizable fractions both concerned and unconcerned by risk from eventual superintelligent AI.[304] Personalities such as Stephen Hawking, Bill Gates, and Elon Musk,[305] as well as AI pioneers such as Yoshua Bengio, Stuart Russell, Demis Hassabis, and Sam Altman, have expressed concerns about existential risk from AI. In May 2023, Geoffrey Hinton announced his resignation from Google in order to be able to "freely speak out about the risks of AI" without "considering how this impacts Google".[306] He notably mentioned risks of an AI takeover,[307] and stressed that in order to avoid the worst outcomes, establishing safety guidelines will require cooperation among those competing in use of AI.[308] In 2023, many leading AI experts endorsed the joint statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war".[309] Some other researchers were more optimistic. AI pioneer Jürgen Schmidhuber did not sign the joint statement, emphasising that in 95% of all cases, AI research is about making "human lives longer and healthier and easier."[310] While the tools that are now being used to improve lives can also be used by bad actors, "they can also be used against the bad actors."[311][312] Andrew Ng also argued that "it's a mistake to fall for the doomsday hype on AI—and that regulators who do will only benefit vested interests."[313] Yann LeCun "scoffs at his peers' dystopian scenarios of supercharged misinformation and even, eventually, human extinction."[314] In the early 2010s, experts argued that the risks are too distant in the future to warrant research or that humans will be valuable from the perspective of a superintelligent machine.[315] However, after 2016, the study of current and future risks and possible solutions became a serious area of research.[316] Ethical machines and alignment Main articles: Machine ethics, AI safety, Friendly artificial intelligence, Artificial moral agents, and Human Compatible Friendly AI are machines that have been designed from the beginning to minimize risks and to make choices that benefit humans. Eliezer Yudkowsky, who coined the term, argues that developing friendly AI should be a higher research priority: it may require a large investment and it must be completed before AI becomes an existential risk.[317] Machines with intelligence have the potential to use their intelligence to make ethical decisions. The field of machine ethics provides machines with ethical principles and procedures for resolving ethical dilemmas.[318] The field of machine ethics is also called computational morality,[318] and was founded at an AAAI symposium in 2005.[319] Other approaches include Wendell Wallach's "artificial moral agents"[320] and Stuart J. Russell's three principles for developing provably beneficial machines.[321] Open source See also: Lists of open-source artificial intelligence software Active organizations in the AI open-source community include Hugging Face,[322] Google,[323] EleutherAI and Meta.[324] Various AI models, such as Llama 2, Mistral or Stable Diffusion, have been made open-weight,[325][326] meaning that their architecture and trained parameters (the "weights") are publicly available. Open-weight models can be freely fine-tuned, which allows companies to specialize them with their own data and for their own use-case.[327] Open-weight models are useful for research and innovation but can also be misused. Since they can be fine-tuned, any built-in security measure, such as objecting to harmful requests, can be trained away until it becomes ineffective. Some researchers warn that future AI models may develop dangerous capabilities (such as the potential to drastically facilitate bioterrorism) and that once released on the Internet, they cannot be deleted everywhere if needed. They recommend pre-release audits and cost-benefit analyses.[328] Frameworks Artificial intelligence projects can be guided by ethical considerations during the design, development, and implementation of an AI system. An AI framework such as the Care and Act Framework, developed by the Alan Turing Institute and based on the SUM values, outlines four main ethical dimensions, defined as follows:[329][330] Respect the dignity of individual people Connect with other people sincerely, openly, and inclusively Care for the wellbeing of everyone Protect social values, justice, and the public interest Other developments in ethical frameworks include those decided upon during the Asilomar Conference, the Montreal Declaration for Responsible AI, and the IEEE's Ethics of Autonomous Systems initiative, among others;[331] however, these principles are not without criticism, especially regarding the people chosen to contribute to these frameworks.[332] Promotion of the wellbeing of the people and communities that these technologies affect requires consideration of the social and ethical implications at all stages of AI system design, development and implementation, and collaboration between job roles such as data scientists, product managers, data engineers, domain experts, and delivery managers.[333] The UK AI Safety Institute released in 2024 a testing toolset called 'Inspect' for AI safety evaluations available under an MIT open-source licence which is freely available on GitHub and can be improved with third-party packages. It can be used to evaluate AI models in a range of areas including core knowledge, ability to reason, and autonomous capabilities.[334] Regulation Main articles: Regulation of artificial intelligence, Regulation of algorithms, and AI safety AI Safety Summit The first global AI Safety Summit was held in the United Kingdom in November 2023 with a declaration calling for international cooperation. The regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating AI; it is therefore related to the broader regulation of algorithms.[335] The regulatory and policy landscape for AI is an emerging issue in jurisdictions globally.[336] According to AI Index at Stanford, the annual number of AI-related laws passed in the 127 survey countries jumped from one passed in 2016 to 37 passed in 2022 alone.[337][338] Between 2016 and 2020, more than 30 countries adopted dedicated strategies for AI.[339] Most EU member states had released national AI strategies, as had Canada, China, India, Japan, Mauritius, the Russian Federation, Saudi Arabia, United Arab Emirates, U.S., and Vietnam. Others were in the process of elaborating their own AI strategy, including Bangladesh, Malaysia and Tunisia.[339] The Global Partnership on Artificial Intelligence was launched in June 2020, stating a need for AI to be developed in accordance with human rights and democratic values, to ensure public confidence and trust in the technology.[339] Henry Kissinger, Eric Schmidt, and Daniel Huttenlocher published a joint statement in November 2021 calling for a government commission to regulate AI.[340] In 2023, OpenAI leaders published recommendations for the governance of superintelligence, which they believe may happen in less than 10 years.[341] In 2023, the United Nations also launched an advisory body to provide recommendations on AI governance; the body comprises technology company executives, government officials and academics.[342] In 2024, the Council of Europe created the first international legally binding treaty on AI, called the "Framework Convention on Artificial Intelligence and Human Rights, Democracy and the Rule of Law". It was adopted by the European Union, the United States, the United Kingdom, and other signatories.[343] In a 2022 Ipsos survey, attitudes towards AI varied greatly by country; 78% of Chinese citizens, but only 35% of Americans, agreed that "products and services using AI have more benefits than drawbacks".[337] A 2023 Reuters/Ipsos poll found that 61% of Americans agree, and 22% disagree, that AI poses risks to humanity.[344] In a 2023 Fox News poll, 35% of Americans thought it "very important", and an additional 41% thought it "somewhat important", for the federal government to regulate AI, versus 13% responding "not very important" and 8% responding "not at all important".[345][346] In November 2023, the first global AI Safety Summit was held in Bletchley Park in the UK to discuss the near and far term risks of AI and the possibility of mandatory and voluntary regulatory frameworks.[347] 28 countries including the United States, China, and the European Union issued a declaration at the start of the summit, calling for international co-operation to manage the challenges and risks of artificial intelligence.[348][349] In May 2024 at the AI Seoul Summit, 16 global AI tech companies agreed to safety commitments on the development of AI.[350][351] History Main article: History of artificial intelligence For a chronological guide, see Timeline of artificial intelligence. In 2024, AI patents in China and the US numbered more than three-fourths of AI patents worldwide.[352] Though China had more AI patents, the US had 35% more patents per AI patent-applicant company than China.[352] The study of mechanical or "formal" reasoning began with philosophers and mathematicians in antiquity. The study of logic led directly to Alan Turing's theory of computation, which suggested that a machine, by shuffling symbols as simple as "0" and "1", could simulate any conceivable form of mathematical reasoning.[353][354] This, along with concurrent discoveries in cybernetics, information theory and neurobiology, led researchers to consider the possibility of building an "electronic brain".[r] They developed several areas of research that would become part of AI,[356] such as McCulloch and Pitts design for "artificial neurons" in 1943,[117] and Turing's influential 1950 paper 'Computing Machinery and Intelligence', which introduced the Turing test and showed that "machine intelligence" was plausible.[357][354] The field of AI research was founded at a workshop at Dartmouth College in 1956.[s][6] The attendees became the leaders of AI research in the 1960s.[t] They and their students produced programs that the press described as "astonishing":[u] computers were learning checkers strategies, solving word problems in algebra, proving logical theorems and speaking English.[v][7] Artificial intelligence laboratories were set up at a number of British and U.S. universities in the latter 1950s and early 1960s.[354] Researchers in the 1960s and the 1970s were convinced that their methods would eventually succeed in creating a machine with general intelligence and considered this the goal of their field.[361] In 1965 Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".[362] In 1967 Marvin Minsky agreed, writing that "within a generation ... the problem of creating 'artificial intelligence' will substantially be solved".[363] They had, however, underestimated the difficulty of the problem.[w] In 1974, both the U.S. and British governments cut off exploratory research in response to the criticism of Sir James Lighthill[365] and ongoing pressure from the U.S. Congress to fund more productive projects.[366] Minsky and Papert's book Perceptrons was understood as proving that artificial neural networks would never be useful for solving real-world tasks, thus discrediting the approach altogether.[367] The "AI winter", a period when obtaining funding for AI projects was difficult, followed.[9] In the early 1980s, AI research was revived by the commercial success of expert systems,[368] a form of AI program that simulated the knowledge and analytical skills of human experts. By 1985, the market for AI had reached over a billion dollars. At the same time, Japan's fifth generation computer project inspired the U.S. and British governments to restore funding for academic research.[8] However, beginning with the collapse of the Lisp Machine market in 1987, AI once again fell into disrepute, and a second, longer-lasting winter began.[10] Up to this point, most of AI's funding had gone to projects that used high-level symbols to represent mental objects like plans, goals, beliefs, and known facts. In the 1980s, some researchers began to doubt that this approach would be able to imitate all the processes of human cognition, especially perception, robotics, learning and pattern recognition,[369] and began to look into "sub-symbolic" approaches.[370] Rodney Brooks rejected "representation" in general and focussed directly on engineering machines that move and survive.[x] Judea Pearl, Lotfi Zadeh, and others developed methods that handled incomplete and uncertain information by making reasonable guesses rather than precise logic.[87][375] But the most important development was the revival of "connectionism", including neural network research, by Geoffrey Hinton and others.[376] In 1990, Yann LeCun successfully showed that convolutional neural networks can recognize handwritten digits, the first of many successful applications of neural networks.[377] AI gradually restored its reputation in the late 1990s and early 21st century by exploiting formal mathematical methods and by finding specific solutions to specific problems. This "narrow" and "formal" focus allowed researchers to produce verifiable results and collaborate with other fields (such as statistics, economics and mathematics).[378] By 2000, solutions developed by AI researchers were being widely used, although in the 1990s they were rarely described as "artificial intelligence" (a tendency known as the AI effect).[379] However, several academic researchers became concerned that AI was no longer pursuing its original goal of creating versatile, fully intelligent machines. Beginning around 2002, they founded the subfield of artificial general intelligence (or "AGI"), which had several well-funded institutions by the 2010s.[68] Deep learning began to dominate industry benchmarks in 2012 and was adopted throughout the field.[11] For many specific tasks, other methods were abandoned.[y] Deep learning's success was based on both hardware improvements (faster computers,[381] graphics processing units, cloud computing[382]) and access to large amounts of data[383] (including curated datasets,[382] such as ImageNet). Deep learning's success led to an enormous increase in interest and funding in AI.[z] The amount of machine learning research (measured by total publications) increased by 50% in the years 2015–2019.[339] The number of Google searches for the term "AI" accelerated in 2022. In 2016, issues of fairness and the misuse of technology were catapulted into center stage at machine learning conferences, publications vastly increased, funding became available, and many researchers re-focussed their careers on these issues. The alignment problem became a serious field of academic study.[316] In the late 2010s and early 2020s, AGI companies began to deliver programs that created enormous interest. In 2015, AlphaGo, developed by DeepMind, beat the world champion Go player. The program taught only the game's rules and developed a strategy by itself. GPT-3 is a large language model that was released in 2020 by OpenAI and is capable of generating high-quality human-like text.[384] ChatGPT, launched on November 30, 2022, became the fastest-growing consumer software application in history, gaining over 100 million users in two months.[385] It marked what is widely regarded as AI's breakout year, bringing it into the public consciousness.[386] These programs, and others, inspired an aggressive AI boom, where large companies began investing billions of dollars in AI research. According to AI Impacts, about US$50 billion annually was invested in "AI" around 2022 in the U.S. alone and about 20% of the new U.S. Computer Science PhD graduates have specialized in "AI".[387] About 800,000 "AI"-related U.S. job openings existed in 2022.[388] According to PitchBook research, 22% of newly funded startups in 2024 claimed to be AI companies.[389] Philosophy Main article: Philosophy of artificial intelligence Philosophical debates have historically sought to determine the nature of intelligence and how to make intelligent machines.[390] Another major focus has been whether machines can be conscious, and the associated ethical implications.[391] Many other topics in philosophy are relevant to AI, such as epistemology and free will.[392] Rapid advancements have intensified public discussions on the philosophy and ethics of AI.[391] Defining artificial intelligence See also: Synthetic intelligence, Intelligent agent, Artificial mind, Virtual intelligence, and Dartmouth workshop Alan Turing wrote in 1950 "I propose to consider the question 'can machines think'?"[393] He advised changing the question from whether a machine "thinks", to "whether or not it is possible for machinery to show intelligent behaviour".[393] He devised the Turing test, which measures the ability of a machine to simulate human conversation.[357] Since we can only observe the behavior of the machine, it does not matter if it is "actually" thinking or literally has a "mind". Turing notes that we can not determine these things about other people but "it is usual to have a polite convention that everyone thinks."[394] The Turing test can provide some evidence of intelligence, but it penalizes non-human intelligent behavior.[395] Russell and Norvig agree with Turing that intelligence must be defined in terms of external behavior, not internal structure.[1] However, they are critical that the test requires the machine to imitate humans. "Aeronautical engineering texts", they wrote, "do not define the goal of their field as making 'machines that fly so exactly like pigeons that they can fool other pigeons.'"[396] AI founder John McCarthy agreed, writing that "Artificial intelligence is not, by definition, simulation of human intelligence".[397] McCarthy defines intelligence as "the computational part of the ability to achieve goals in the world".[398] Another AI founder, Marvin Minsky, similarly describes it as "the ability to solve hard problems".[399] The leading AI textbook defines it as the study of agents that perceive their environment and take actions that maximize their chances of achieving defined goals.[1] These definitions view intelligence in terms of well-defined problems with well-defined solutions, where both the difficulty of the problem and the performance of the program are direct measures of the "intelligence" of the machine—and no other philosophical discussion is required, or may not even be possible. Another definition has been adopted by Google,[400] a major practitioner in the field of AI. This definition stipulates the ability of systems to synthesize information as the manifestation of intelligence, similar to the way it is defined in biological intelligence. Some authors have suggested in practice, that the definition of AI is vague and difficult to define, with contention as to whether classical algorithms should be categorised as AI,[401] with many companies during the early 2020s AI boom using the term as a marketing buzzword, often even if they did "not actually use AI in a material way".[402] There has been debate over whether large language models exhibit genuine intelligence or merely simulate it by imitating human text.[403] Evaluating approaches to AI No established unifying theory or paradigm has guided AI research for most of its history.[aa] The unprecedented success of statistical machine learning in the 2010s eclipsed all other approaches (so much so that some sources, especially in the business world, use the term "artificial intelligence" to mean "machine learning with neural networks"). This approach is mostly sub-symbolic, soft and narrow. Critics argue that these questions may have to be revisited by future generations of AI researchers. Symbolic AI and its limits Symbolic AI (or "GOFAI")[405] simulated the high-level conscious reasoning that people use when they solve puzzles, express legal reasoning and do mathematics. They were highly successful at "intelligent" tasks such as algebra or IQ tests. In the 1960s, Newell and Simon proposed the physical symbol systems hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."[406] However, the symbolic approach failed on many tasks that humans solve easily, such as learning, recognizing an object or commonsense reasoning. Moravec's paradox is the discovery that high-level "intelligent" tasks were easy for AI, but low level "instinctive" tasks were extremely difficult.[407] Philosopher Hubert Dreyfus had argued since the 1960s that human expertise depends on unconscious instinct rather than conscious symbol manipulation, and on having a "feel" for the situation, rather than explicit symbolic knowledge.[408] Although his arguments had been ridiculed and ignored when they were first presented, eventually, AI research came to agree with him.[ab][16] The issue is not resolved: sub-symbolic reasoning can make many of the same inscrutable mistakes that human intuition does, such as algorithmic bias. Critics such as Noam Chomsky argue continuing research into symbolic AI will still be necessary to attain general intelligence,[410][411] in part because sub-symbolic AI is a move away from explainable AI: it can be difficult or impossible to understand why a modern statistical AI program made a particular decision. The emerging field of neuro-symbolic artificial intelligence attempts to bridge the two approaches. Neat vs. scruffy Main article: Neats and scruffies "Neats" hope that intelligent behavior is described using simple, elegant principles (such as logic, optimization, or neural networks). "Scruffies" expect that it necessarily requires solving a large number of unrelated problems. Neats defend their programs with theoretical rigor, scruffies rely mainly on incremental testing to see if they work. This issue was actively discussed in the 1970s and 1980s,[412] but eventually was seen as irrelevant. Modern AI has elements of both. Soft vs. hard computing Main article: Soft computing Finding a provably correct or optimal solution is intractable for many important problems.[15] Soft computing is a set of techniques, including genetic algorithms, fuzzy logic and neural networks, that are tolerant of imprecision, uncertainty, partial truth and approximation. Soft computing was introduced in the late 1980s and most successful AI programs in the 21st century are examples of soft computing with neural networks. Narrow vs. general AI Main articles: Weak artificial intelligence and Artificial general intelligence AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence directly or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals.[413][414] General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focusing on specific problems with specific solutions. The sub-field of artificial general intelligence studies this area exclusively. Machine consciousness, sentience, and mind Main articles: Philosophy of artificial intelligence and Artificial consciousness There is no settled consensus in philosophy of mind on whether a machine can have a mind, consciousness and mental states in the same sense that human beings do. This issue considers the internal experiences of the machine, rather than its external behavior. Mainstream AI research considers this issue irrelevant because it does not affect the goals of the field: to build machines that can solve problems using intelligence. Russell and Norvig add that "[t]he additional project of making a machine conscious in exactly the way humans are is not one that we are equipped to take on."[415] However, the question has become central to the philosophy of mind. It is also typically the central question at issue in artificial intelligence in fiction. Consciousness Main articles: Hard problem of consciousness and Theory of mind David Chalmers identified two problems in understanding the mind, which he named the "hard" and "easy" problems of consciousness.[416] The easy problem is understanding how the brain processes signals, makes plans and controls behavior. The hard problem is explaining how this feels or why it should feel like anything at all, assuming we are right in thinking that it truly does feel like something (Dennett's consciousness illusionism says this is an illusion). While human information processing is easy to explain, human subjective experience is difficult to explain. For example, it is easy to imagine a color-blind person who has learned to identify which objects in their field of view are red, but it is not clear what would be required for the person to know what red looks like.[417] Computationalism and functionalism Main articles: Computational theory of mind and Functionalism (philosophy of mind) Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing. Computationalism argues that the relationship between mind and body is similar or identical to the relationship between software and hardware and thus may be a solution to the mind–body problem. This philosophical position was inspired by the work of AI researchers and cognitive scientists in the 1960s and was originally proposed by philosophers Jerry Fodor and Hilary Putnam.[418] Philosopher John Searle characterized this position as "strong AI": "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."[ac] Searle challenges this claim with his Chinese room argument, which attempts to show that even a computer capable of perfectly simulating human behavior would not have a mind.[422] AI welfare and rights It is difficult or impossible to reliably evaluate whether an advanced AI is sentient (has the ability to feel), and if so, to what degree.[423] But if there is a significant chance that a given machine can feel and suffer, then it may be entitled to certain rights or welfare protection measures, similarly to animals.[424][425] Sapience (a set of capacities related to high intelligence, such as discernment or self-awareness) may provide another moral basis for AI rights.[424] Robot rights are also sometimes proposed as a practical way to integrate autonomous agents into society.[426] In 2017, the European Union considered granting "electronic personhood" to some of the most capable AI systems. Similarly to the legal status of companies, it would have conferred rights but also responsibilities.[427] Critics argued in 2018 that granting rights to AI systems would downplay the importance of human rights, and that legislation should focus on user needs rather than speculative futuristic scenarios. They also noted that robots lacked the autonomy to take part in society on their own.[428][429] Progress in AI increased interest in the topic. Proponents of AI welfare and rights often argue that AI sentience, if it emerges, would be particularly easy to deny. They warn that this may be a moral blind spot analogous to slavery or factory farming, which could lead to large-scale suffering if sentient AI is created and carelessly exploited.[425][424] Future Superintelligence and the singularity A superintelligence is a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind.[414] If research into artificial general intelligence produced sufficiently intelligent software, it might be able to reprogram and improve itself. The improved software would be even better at improving itself, leading to what I. J. Good called an "intelligence explosion" and Vernor Vinge called a "singularity".[430] However, technologies cannot improve exponentially indefinitely, and typically follow an S-shaped curve, slowing when they reach the physical limits of what the technology can do.[431] Transhumanism Main article: Transhumanism Robot designer Hans Moravec, cyberneticist Kevin Warwick and inventor Ray Kurzweil have predicted that humans and machines may merge in the future into cyborgs that are more capable and powerful than either. This idea, called transhumanism, has roots in the writings of Aldous Huxley and Robert Ettinger.[432] Edward Fredkin argues that "artificial intelligence is the next step in evolution", an idea first proposed by Samuel Butler's "Darwin among the Machines" as far back as 1863, and expanded upon by George Dyson in his 1998 book Darwin Among the Machines: The Evolution of Global Intelligence.[433] In fiction Main article: Artificial intelligence in fiction The word "robot" itself was coined by Karel Čapek in his 1921 play R.U.R., the title standing for "Rossum's Universal Robots". Thought-capable artificial beings have appeared as storytelling devices since antiquity,[434] and have been a persistent theme in science fiction.[435] A common trope in these works began with Mary Shelley's Frankenstein, where a human creation becomes a threat to its masters. This includes such works as Arthur C. Clarke's and Stanley Kubrick's 2001: A Space Odyssey (both 1968), with HAL 9000, the murderous computer in charge of the Discovery One spaceship, as well as The Terminator (1984) and The Matrix (1999). In contrast, the rare loyal robots such as Gort from The Day the Earth Stood Still (1951) and Bishop from Aliens (1986) are less prominent in popular culture.[436] Isaac Asimov introduced the Three Laws of Robotics in many stories, most notably with the "Multivac" super-intelligent computer. Asimov's laws are often brought up during lay discussions of machine ethics;[437] while almost all artificial intelligence researchers are familiar with Asimov's laws through popular culture, they generally consider the laws useless for many reasons, one of which is their ambiguity.[438] Several works use AI to force us to confront the fundamental question of what makes us human, showing us artificial beings that have the ability to feel, and thus to suffer. This appears in Karel Čapek's R.U.R., the films A.I. Artificial Intelligence and Ex Machina, as well as the novel Do Androids Dream of Electric Sheep?, by Philip K. Dick. Dick considers the idea that our understanding of human subjectivity is altered by technology created with artificial intelligence.[439] See also Artificial consciousness – Field in cognitive science Artificial intelligence and elections – Use and impact of AI on political elections Artificial intelligence content detection – Software to detect AI-generated content Association for the Advancement of Artificial Intelligence (AAAI) Behavior selection algorithm – Algorithm that selects actions for intelligent agents Business process automation – Automation of business processes Case-based reasoning – Process of solving new problems based on the solutions of similar past problems Computational intelligence – Ability of a computer to learn a specific task from data or experimental observation Digital immortality – Hypothetical concept of storing a personality in digital form Emergent algorithm – Algorithm exhibiting emergent behavior Female gendering of AI technologies – Gender biases in digital technology Glossary of artificial intelligence – List of definitions of terms and concepts commonly used in the study of artificial intelligence Intelligence amplification – Use of information technology to augment human intelligence Intelligent agent – Software agent which acts autonomously Intelligent automation – Software process that combines robotic process automation and artificial intelligence List of artificial intelligence journals List of artificial intelligence projects Mind uploading – Hypothetical process of digitally emulating a brain Organoid intelligence – Use of brain cells and brain organoids for intelligent computing Robotic process automation – Form of business process automation technology The Last Day – 1967 Welsh science fiction novel Wetware computer – Computer composed of organic material DARWIN EU - A European Union initiative coordinated by the European Medicines Agency (EMA) to generate and utilize real-world evidence (RWE) to support the evaluation and supervision of medicines across the EU. Explanatory notes This list of intelligent traits is based on the topics covered by the major AI textbooks, including: Russell & Norvig (2021), Luger & Stubblefield (2004), Poole, Mackworth & Goebel (1998) and Nilsson (1998) This list of tools is based on the topics covered by the major AI textbooks, including: Russell & Norvig (2021), Luger & Stubblefield (2004), Poole, Mackworth & Goebel (1998) and Nilsson (1998) It is among the reasons that expert systems proved to be inefficient for capturing knowledge.[30][31] "Rational agent" is general term used in economics, philosophy and theoretical artificial intelligence. It can refer to anything that directs its behavior to accomplish goals, such as a person, an animal, a corporation, a nation, or in the case of AI, a computer program. Alan Turing discussed the centrality of learning as early as 1950, in his classic paper "Computing Machinery and Intelligence".[42] In 1956, at the original Dartmouth AI summer conference, Ray Solomonoff wrote a report on unsupervised probabilistic machine learning: "An Inductive Inference Machine".[43] See AI winter § Machine translation and the ALPAC report of 1966 Compared with symbolic logic, formal Bayesian inference is computationally expensive. For inference to be tractable, most observations must be conditionally independent of one another. AdSense uses a Bayesian network with over 300 million edges to learn which ads to serve.[94] Expectation–maximization, one of the most popular algorithms in machine learning, allows clustering in the presence of unknown latent variables.[96] Some form of deep neural networks (without a specific learning algorithm) were described by: Warren S. McCulloch and Walter Pitts (1943)[117] Alan Turing (1948);[118] Karl Steinbuch and Roger David Joseph (1961).[119] Deep or recurrent networks that learned (or used gradient descent) were developed by: Frank Rosenblatt(1957);[118] Oliver Selfridge (1959);[119] Alexey Ivakhnenko and Valentin Lapa (1965);[120] Kaoru Nakano (1971);[121] Shun-Ichi Amari (1972);[121] John Joseph Hopfield (1982).[121] Precursors to backpropagation were developed by: Henry J. Kelley (1960);[118] Arthur E. Bryson (1962);[118] Stuart Dreyfus (1962);[118] Arthur E. Bryson and Yu-Chi Ho (1969);[118] Backpropagation was independently developed by: Seppo Linnainmaa (1970);[122] Paul Werbos (1974).[118] Geoffrey Hinton said, of his work on neural networks in the 1990s, "our labeled datasets were thousands of times too small. [And] our computers were millions of times too slow."[123] In statistics, a bias is a systematic error or deviation from the correct value. But in the context of fairness, it refers to a tendency in favor or against a certain group or individual characteristic, usually in a way that is considered unfair or harmful. A statistically unbiased AI system that produces disparate outcomes for different demographic groups may thus be viewed as biased in the ethical sense.[252] Including Jon Kleinberg (Cornell University), Sendhil Mullainathan (University of Chicago), Cynthia Chouldechova (Carnegie Mellon) and Sam Corbett-Davis (Stanford)[261] Moritz Hardt (a director at the Max Planck Institute for Intelligent Systems) argues that machine learning "is fundamentally the wrong tool for a lot of domains, where you're trying to design interventions and mechanisms that change the world."[266] When the law was passed in 2018, it still contained a form of this provision. This is the United Nations' definition, and includes things like land mines as well.[280] See table 4; 9% is both the OECD average and the U.S. average.[291] Sometimes called a "robopocalypse"[299] "Electronic brain" was the term used by the press around this time.[353][355] Daniel Crevier wrote, "the conference is generally recognized as the official birthdate of the new science."[358] Russell and Norvig called the conference "the inception of artificial intelligence."[117] Russell and Norvig wrote "for the next 20 years the field would be dominated by these people and their students."[359] Russell and Norvig wrote, "it was astonishing whenever a computer did anything kind of smartish".[360] The programs described are Arthur Samuel's checkers program for the IBM 701, Daniel Bobrow's STUDENT, Newell and Simon's Logic Theorist and Terry Winograd's SHRDLU. Russell and Norvig write: "in almost all cases, these early systems failed on more difficult problems"[364] Embodied approaches to AI[371] were championed by Hans Moravec[372] and Rodney Brooks[373] and went by many names: Nouvelle AI.[373] Developmental robotics.[374] Matteo Wong wrote in The Atlantic: "Whereas for decades, computer-science fields such as natural-language processing, computer vision, and robotics used extremely different methods, now they all use a programming method called "deep learning". As a result, their code and approaches have become more similar, and their models are easier to integrate into one another."[380] Jack Clark wrote in Bloomberg: "After a half-decade of quiet breakthroughs in artificial intelligence, 2015 has been a landmark year. Computers are smarter and learning faster than ever", and noted that the number of software projects that use machine learning at Google increased from a "sporadic usage" in 2012 to more than 2,700 projects in 2015.[382] Nils Nilsson wrote in 1983: "Simply put, there is wide disagreement in the field about what AI is all about."[404] Daniel Crevier wrote that "time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier."[409] Searle presented this definition of "Strong AI" in 1999.[419] Searle's original formulation was "The appropriately programmed computer really is a mind, in the sense that computers given the right programs can be literally said to understand and have other cognitive states."[420] Strong AI is defined similarly by Russell and Norvig: "Stong AI – the assertion that machines that do so are actually thinking (as opposed to simulating thinking)."[421] References Russell & Norvig (2021), pp. 1–4. AI set to exceed human brain power Archived 2008-02-19 at the Wayback Machine CNN.com (July 26, 2006) Kaplan, Andreas; Haenlein, Michael (2019). "Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence". Business Horizons. 62: 15–25. doi:10.1016/j.bushor.2018.08.004. ISSN 0007-6813. S2CID 158433736. Russell & Norvig (2021, §1.2). "Tech companies want to build artificial general intelligence. But who decides when AGI is attained?". AP News. 4 April 2024. Retrieved 20 May 2025. Dartmouth workshop: Russell & Norvig (2021, p. 18), McCorduck (2004, pp. 111–136), NRC (1999, pp. 200–201) The proposal: McCarthy et al. (1955) Successful programs of the 1960s: McCorduck (2004, pp. 243–252), Crevier (1993, pp. 52–107), Moravec (1988, p. 9), Russell & Norvig (2021, pp. 19–21) Funding initiatives in the early 1980s: Fifth Generation Project (Japan), Alvey (UK), Microelectronics and Computer Technology Corporation (US), Strategic Computing Initiative (US): McCorduck (2004, pp. 426–441), Crevier (1993, pp. 161–162, 197–203, 211, 240), Russell & Norvig (2021, p. 23), NRC (1999, pp. 210–211), Newquist (1994, pp. 235–248) First AI Winter, Lighthill report, Mansfield Amendment: Crevier (1993, pp. 115–117), Russell & Norvig (2021, pp. 21–22), NRC (1999, pp. 212–213), Howe (1994), Newquist (1994, pp. 189–201) Second AI Winter: Russell & Norvig (2021, p. 24), McCorduck (2004, pp. 430–435), Crevier (1993, pp. 209–210), NRC (1999, pp. 214–216), Newquist (1994, pp. 301–318) Deep learning revolution, AlexNet: Goldman (2022), Russell & Norvig (2021, p. 26), McKinsey (2018) Toews (2023). Problem-solving, puzzle solving, game playing, and deduction: Russell & Norvig (2021, chpt. 3–5), Russell & Norvig (2021, chpt. 6) (constraint satisfaction), Poole, Mackworth & Goebel (1998, chpt. 2, 3, 7, 9), Luger & Stubblefield (2004, chpt. 3, 4, 6, 8), Nilsson (1998, chpt. 7–12) Uncertain reasoning: Russell & Norvig (2021, chpt. 12–18), Poole, Mackworth & Goebel (1998, pp. 345–395), Luger & Stubblefield (2004, pp. 333–381), Nilsson (1998, chpt. 7–12) Intractability and efficiency and the combinatorial explosion: Russell & Norvig (2021, p. 21) Psychological evidence of the prevalence of sub-symbolic reasoning and knowledge: Kahneman (2011), Dreyfus & Dreyfus (1986), Wason & Shapiro (1966), Kahneman, Slovic & Tversky (1982) Knowledge representation and knowledge engineering: Russell & Norvig (2021, chpt. 10), Poole, Mackworth & Goebel (1998, pp. 23–46, 69–81, 169–233, 235–277, 281–298, 319–345), Luger & Stubblefield (2004, pp. 227–243), Nilsson (1998, chpt. 17.1–17.4, 18) Smoliar & Zhang (1994). Neumann & Möller (2008). Kuperman, Reichley & Bailey (2006). McGarry (2005). Bertini, Del Bimbo & Torniai (2006). Russell & Norvig (2021), pp. 272. Representing categories and relations: Semantic networks, description logics, inheritance (including frames, and scripts): Russell & Norvig (2021, §10.2 & 10.5), Poole, Mackworth & Goebel (1998, pp. 174–177), Luger & Stubblefield (2004, pp. 248–258), Nilsson (1998, chpt. 18.3) Representing events and time:Situation calculus, event calculus, fluent calculus (including solving the frame problem): Russell & Norvig (2021, §10.3), Poole, Mackworth & Goebel (1998, pp. 281–298), Nilsson (1998, chpt. 18.2) Causal calculus: Poole, Mackworth & Goebel (1998, pp. 335–337) Representing knowledge about knowledge: Belief calculus, modal logics: Russell & Norvig (2021, §10.4), Poole, Mackworth & Goebel (1998, pp. 275–277) Default reasoning, Frame problem, default logic, non-monotonic logics, circumscription, closed world assumption, abduction: Russell & Norvig (2021, §10.6), Poole, Mackworth & Goebel (1998, pp. 248–256, 323–335), Luger & Stubblefield (2004, pp. 335–363), Nilsson (1998, ~18.3.3) (Poole et al. places abduction under "default reasoning". Luger et al. places this under "uncertain reasoning"). Breadth of commonsense knowledge: Lenat & Guha (1989, Introduction), Crevier (1993, pp. 113–114), Moravec (1988, p. 13), Russell & Norvig (2021, pp. 241, 385, 982) (qualification problem) Newquist (1994), p. 296. Crevier (1993), pp. 204–208. Russell & Norvig (2021), p. 528. Automated planning: Russell & Norvig (2021, chpt. 11). Automated decision making, Decision theory: Russell & Norvig (2021, chpt. 16–18). Classical planning: Russell & Norvig (2021, Section 11.2). Sensorless or "conformant" planning, contingent planning, replanning (a.k.a. online planning): Russell & Norvig (2021, Section 11.5). Uncertain preferences: Russell & Norvig (2021, Section 16.7) Inverse reinforcement learning: Russell & Norvig (2021, Section 22.6) Information value theory: Russell & Norvig (2021, Section 16.6). Markov decision process: Russell & Norvig (2021, chpt. 17). Game theory and multi-agent decision theory: Russell & Norvig (2021, chpt. 18). Learning: Russell & Norvig (2021, chpt. 19–22), Poole, Mackworth & Goebel (1998, pp. 397–438), Luger & Stubblefield (2004, pp. 385–542), Nilsson (1998, chpt. 3.3, 10.3, 17.5, 20) Turing (1950). Solomonoff (1956). Unsupervised learning: Russell & Norvig (2021, pp. 653) (definition), Russell & Norvig (2021, pp. 738–740) (cluster analysis), Russell & Norvig (2021, pp. 846–860) (word embedding) Supervised learning: Russell & Norvig (2021, §19.2) (Definition), Russell & Norvig (2021, Chpt. 19–20) (Techniques) Reinforcement learning: Russell & Norvig (2021, chpt. 22), Luger & Stubblefield (2004, pp. 442–449) Transfer learning: Russell & Norvig (2021, pp. 281), The Economist (2016) "Artificial Intelligence (AI): What Is AI and How Does It Work? | Built In". builtin.com. Retrieved 30 October 2023. Computational learning theory: Russell & Norvig (2021, pp. 672–674), Jordan & Mitchell (2015) Natural language processing (NLP): Russell & Norvig (2021, chpt. 23–24), Poole, Mackworth & Goebel (1998, pp. 91–104), Luger & Stubblefield (2004, pp. 591–632) Subproblems of NLP: Russell & Norvig (2021, pp. 849–850) Russell & Norvig (2021), pp. 856–858. Dickson (2022). Modern statistical and deep learning approaches to NLP: Russell & Norvig (2021, chpt. 24), Cambria & White (2014) Vincent (2019). Russell & Norvig (2021), pp. 875–878. Bushwick (2023). Computer vision: Russell & Norvig (2021, chpt. 25), Nilsson (1998, chpt. 6) Russell & Norvig (2021), pp. 849–850. Russell & Norvig (2021), pp. 895–899. Russell & Norvig (2021), pp. 899–901. Challa et al. (2011). Russell & Norvig (2021), pp. 931–938. MIT AIL (2014). Affective computing: Thro (1993), Edelson (1991), Tao & Tan (2005), Scassellati (2002) Waddell (2018). Poria et al. (2017). Artificial general intelligence: Russell & Norvig (2021, pp. 32–33, 1020–1021) Proposal for the modern version: Pennachin & Goertzel (2007) Warnings of overspecialization in AI from leading researchers: Nilsson (1995), McCarthy (2007), Beal & Winston (2009) Search algorithms: Russell & Norvig (2021, chpts. 3–5), Poole, Mackworth & Goebel (1998, pp. 113–163), Luger & Stubblefield (2004, pp. 79–164, 193–219), Nilsson (1998, chpts. 7–12) State space search: Russell & Norvig (2021, chpt. 3) Russell & Norvig (2021), sect. 11.2. Uninformed searches (breadth first search, depth-first search and general state space search): Russell & Norvig (2021, sect. 3.4), Poole, Mackworth & Goebel (1998, pp. 113–132), Luger & Stubblefield (2004, pp. 79–121), Nilsson (1998, chpt. 8) Heuristic or informed searches (e.g., greedy best first and A*): Russell & Norvig (2021, sect. 3.5), Poole, Mackworth & Goebel (1998, pp. 132–147), Poole & Mackworth (2017, sect. 3.6), Luger & Stubblefield (2004, pp. 133–150) Adversarial search: Russell & Norvig (2021, chpt. 5) Local or "optimization" search: Russell & Norvig (2021, chpt. 4) Singh Chauhan, Nagesh (18 December 2020). "Optimization Algorithms in Neural Networks". KDnuggets. Retrieved 13 January 2024. Evolutionary computation: Russell & Norvig (2021, sect. 4.1.2) Merkle & Middendorf (2013). Logic: Russell & Norvig (2021, chpts. 6–9), Luger & Stubblefield (2004, pp. 35–77), Nilsson (1998, chpt. 13–16) Propositional logic: Russell & Norvig (2021, chpt. 6), Luger & Stubblefield (2004, pp. 45–50), Nilsson (1998, chpt. 13) First-order logic and features such as equality: Russell & Norvig (2021, chpt. 7), Poole, Mackworth & Goebel (1998, pp. 268–275), Luger & Stubblefield (2004, pp. 50–62), Nilsson (1998, chpt. 15) Logical inference: Russell & Norvig (2021, chpt. 10) logical deduction as search: Russell & Norvig (2021, sects. 9.3, 9.4), Poole, Mackworth & Goebel (1998, pp. ~46–52), Luger & Stubblefield (2004, pp. 62–73), Nilsson (1998, chpt. 4.2, 7.2) Resolution and unification: Russell & Norvig (2021, sections 7.5.2, 9.2, 9.5) Warren, D.H.; Pereira, L.M.; Pereira, F. (1977). "Prolog-the language and its implementation compared with Lisp". ACM SIGPLAN Notices. 12 (8): 109–115. doi:10.1145/872734.806939. Fuzzy logic: Russell & Norvig (2021, pp. 214, 255, 459), Scientific American (1999) Stochastic methods for uncertain reasoning: Russell & Norvig (2021, chpt. 12–18, 20), Poole, Mackworth & Goebel (1998, pp. 345–395), Luger & Stubblefield (2004, pp. 165–191, 333–381), Nilsson (1998, chpt. 19) decision theory and decision analysis: Russell & Norvig (2021, chpt. 16–18), Poole, Mackworth & Goebel (1998, pp. 381–394) Information value theory: Russell & Norvig (2021, sect. 16.6) Markov decision processes and dynamic decision networks: Russell & Norvig (2021, chpt. 17) Stochastic temporal models: Russell & Norvig (2021, chpt. 14) Hidden Markov model: Russell & Norvig (2021, sect. 14.3) Kalman filters: Russell & Norvig (2021, sect. 14.4) Dynamic Bayesian networks: Russell & Norvig (2021, sect. 14.5) Game theory and mechanism design: Russell & Norvig (2021, chpt. 18) Bayesian networks: Russell & Norvig (2021, sects. 12.5–12.6, 13.4–13.5, 14.3–14.5, 16.5, 20.2–20.3), Poole, Mackworth & Goebel (1998, pp. 361–381), Luger & Stubblefield (2004, pp. ~182–190, ≈363–379), Nilsson (1998, chpt. 19.3–19.4) Domingos (2015), chpt. 6. Bayesian inference algorithm: Russell & Norvig (2021, sect. 13.3–13.5), Poole, Mackworth & Goebel (1998, pp. 361–381), Luger & Stubblefield (2004, pp. ~363–379), Nilsson (1998, chpt. 19.4 & 7) Domingos (2015), p. 210. Bayesian learning and the expectation–maximization algorithm: Russell & Norvig (2021, chpt. 20), Poole, Mackworth & Goebel (1998, pp. 424–433), Nilsson (1998, chpt. 20), Domingos (2015, p. 210) Bayesian decision theory and Bayesian decision networks: Russell & Norvig (2021, sect. 16.5) Statistical learning methods and classifiers: Russell & Norvig (2021, chpt. 20), Ciaramella, Alberto; Ciaramella, Marco (2024). Introduction to Artificial Intelligence: from data analysis to generative AI. Intellisemantic Editions. ISBN 978-8-8947-8760-3. Decision trees: Russell & Norvig (2021, sect. 19.3), Domingos (2015, p. 88) Non-parameteric learning models such as K-nearest neighbor and support vector machines: Russell & Norvig (2021, sect. 19.7), Domingos (2015, p. 187) (k-nearest neighbor) Domingos (2015, p. 88) (kernel methods) Domingos (2015), p. 152. Naive Bayes classifier: Russell & Norvig (2021, sect. 12.6), Domingos (2015, p. 152) Neural networks: Russell & Norvig (2021, chpt. 21), Domingos (2015, Chapter 4) Gradient calculation in computational graphs, backpropagation, automatic differentiation: Russell & Norvig (2021, sect. 21.2), Luger & Stubblefield (2004, pp. 467–474), Nilsson (1998, chpt. 3.3) Universal approximation theorem: Russell & Norvig (2021, p. 752) The theorem: Cybenko (1988), Hornik, Stinchcombe & White (1989) Feedforward neural networks: Russell & Norvig (2021, sect. 21.1) Perceptrons: Russell & Norvig (2021, pp. 21, 22, 683, 22) Deep learning: Russell & Norvig (2021, chpt. 21), Goodfellow, Bengio & Courville (2016), Hinton et al. (2016), Schmidhuber (2015) Recurrent neural networks: Russell & Norvig (2021, sect. 21.6) Convolutional neural networks: Russell & Norvig (2021, sect. 21.3) Sindhu V, Nivedha S, Prakash M (February 2020). "An Empirical Science Research on Bioinformatics in Machine Learning". Journal of Mechanics of Continua and Mathematical Sciences (7). doi:10.26782/jmcms.spl.7/2020.02.00006. Deng & Yu (2014), pp. 199–200. Ciresan, Meier & Schmidhuber (2012). Russell & Norvig (2021), p. 750. Russell & Norvig (2021), p. 17. Russell & Norvig (2021), p. 785. Schmidhuber (2022), sect. 5. Schmidhuber (2022), sect. 6. Schmidhuber (2022), sect. 7. Schmidhuber (2022), sect. 8. Quoted in Christian (2020, p. 22) Metz, Cade; Weise, Karen (5 May 2025). "A.I. Hallucinations Are Getting Worse, Even as New Systems Become More Powerful". The New York Times. ISSN 0362-4331. Retrieved 6 May 2025. Smith (2023). "Explained: Generative AI". 9 November 2023. "AI Writing and Content Creation Tools". MIT Sloan Teaching & Learning Technologies. Archived from the original on 25 December 2023. Retrieved 25 December 2023. Marmouyet (2023). Kobielus (2019). Thomason, James (21 May 2024). "Mojo Rising: The resurgence of AI-first programming languages". VentureBeat. Archived from the original on 27 June 2024. Retrieved 26 May 2024. Wodecki, Ben (5 May 2023). "7 AI Programming Languages You Need to Know". AI Business. Archived from the original on 25 July 2024. Retrieved 5 October 2024. Plumb, Taryn (18 September 2024). "Why Jensen Huang and Marc Benioff see 'gigantic' opportunity for agentic AI". VentureBeat. Archived from the original on 5 October 2024. Retrieved 4 October 2024. Mims, Christopher (19 September 2020). "Huang's Law Is the New Moore's Law, and Explains Why Nvidia Wants Arm". Wall Street Journal. ISSN 0099-9660. Archived from the original on 2 October 2023. Retrieved 19 January 2025. Davenport, T; Kalakota, R (June 2019). "The potential for artificial intelligence in healthcare". Future Healthc J. 6 (2): 94–98. doi:10.7861/futurehosp.6-2-94. PMC 6616181. PMID 31363513. Lyakhova, U.A.; Lyakhov, P.A. (2024). "Systematic review of approaches to detection and classification of skin cancer using artificial intelligence: Development and prospects". Computers in Biology and Medicine. 178: 108742. doi:10.1016/j.compbiomed.2024.108742. PMID 38875908. Archived from the original on 3 December 2024. Retrieved 10 October 2024. Alqudaihi, Kawther S.; Aslam, Nida; Khan, Irfan Ullah; Almuhaideb, Abdullah M.; Alsunaidi, Shikah J.; Ibrahim, Nehad M. Abdel Rahman; Alhaidari, Fahd A.; Shaikh, Fatema S.; Alsenbel, Yasmine M.; Alalharith, Dima M.; Alharthi, Hajar M.; Alghamdi, Wejdan M.; Alshahrani, Mohammed S. (2021). "Cough Sound Detection and Diagnosis Using Artificial Intelligence Techniques: Challenges and Opportunities". IEEE Access. 9: 102327–102344. Bibcode:2021IEEEA...9j2327A. doi:10.1109/ACCESS.2021.3097559. ISSN 2169-3536. PMC 8545201. PMID 34786317. Bax, Monique; Thorpe, Jordan; Romanov, Valentin (December 2023). "The future of personalized cardiovascular medicine demands 3D and 4D printing, stem cells, and artificial intelligence". Frontiers in Sensors. 4. doi:10.3389/fsens.2023.1294721. ISSN 2673-5067. Dankwa-Mullan, Irene (2024). "Health Equity and Ethical Considerations in Using Artificial Intelligence in Public Health and Medicine". Preventing Chronic Disease. 21: E64. doi:10.5888/pcd21.240245. ISSN 1545-1151. PMC 11364282. PMID 39173183. Jumper, J; Evans, R; Pritzel, A (2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. PMC 8371605. PMID 34265844. "AI discovers new class of antibiotics to kill drug-resistant bacteria". 20 December 2023. Archived from the original on 16 September 2024. Retrieved 5 October 2024. "AI speeds up drug design for Parkinson's ten-fold". Cambridge University. 17 April 2024. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Horne, Robert I.; Andrzejewska, Ewa A.; Alam, Parvez; Brotzakis, Z. Faidon; Srivastava, Ankit; Aubert, Alice; Nowinska, Magdalena; Gregory, Rebecca C.; Staats, Roxine; Possenti, Andrea; Chia, Sean; Sormanni, Pietro; Ghetti, Bernardino; Caughey, Byron; Knowles, Tuomas P. J.; Vendruscolo, Michele (17 April 2024). "Discovery of potent inhibitors of α-synuclein aggregation using structure-based iterative learning". Nature Chemical Biology. 20 (5). Nature: 634–645. doi:10.1038/s41589-024-01580-x. PMC 11062903. PMID 38632492. Grant, Eugene F.; Lardner, Rex (25 July 1952). "The Talk of the Town – It". The New Yorker. ISSN 0028-792X. Archived from the original on 16 February 2020. Retrieved 28 January 2024. Anderson, Mark Robert (11 May 2017). "Twenty years on from Deep Blue vs Kasparov: how a chess match started the big data revolution". The Conversation. Archived from the original on 17 September 2024. Retrieved 28 January 2024. Markoff, John (16 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". The New York Times. ISSN 0362-4331. Archived from the original on 22 October 2014. Retrieved 28 January 2024. Byford, Sam (27 May 2017). "AlphaGo retires from competitive Go after defeating world number one 3–0". The Verge. Archived from the original on 7 June 2017. Retrieved 28 January 2024. Brown, Noam; Sandholm, Tuomas (30 August 2019). "Superhuman AI for multiplayer poker". Science. 365 (6456): 885–890. Bibcode:2019Sci...365..885B. doi:10.1126/science.aay2400. ISSN 0036-8075. PMID 31296650. "MuZero: Mastering Go, chess, shogi and Atari without rules". Google DeepMind. 23 December 2020. Retrieved 28 January 2024. Sample, Ian (30 October 2019). "AI becomes grandmaster in 'fiendishly complex' StarCraft II". The Guardian. ISSN 0261-3077. Archived from the original on 29 December 2020. Retrieved 28 January 2024. Wurman, P. R.; Barrett, S.; Kawamoto, K. (2022). "Outracing champion Gran Turismo drivers with deep reinforcement learning" (PDF). Nature. 602 (7896): 223–228. Bibcode:2022Natur.602..223W. doi:10.1038/s41586-021-04357-7. PMID 35140384. Wilkins, Alex (13 March 2024). "Google AI learns to play open-world video games by watching them". New Scientist. Archived from the original on 26 July 2024. Retrieved 21 July 2024. Wu, Zhengxuan; Arora, Aryaman; Wang, Zheng; Geiger, Atticus; Jurafsky, Dan; Manning, Christopher D.; Potts, Christopher (2024). "ReFT: Representation Finetuning for Language Models". NeurIPS. arXiv:2404.03592. "Improving mathematical reasoning with process supervision". OpenAI. 31 May 2023. Retrieved 26 January 2025. Srivastava, Saurabh (29 February 2024). "Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap". arXiv:2402.19450 [cs.AI]. Lightman, Hunter; Kosaraju, Vineet; Burda, Yura; Edwards, Harri; Baker, Bowen; Lee, Teddy; Leike, Jan; Schulman, John; Sutskever, Ilya; Cobbe, Karl (2023). "Let's Verify Step by Step". arXiv:2305.20050v1 [cs.LG]. Franzen, Carl (8 August 2024). "Alibaba claims no. 1 spot in AI math models with Qwen2-Math". VentureBeat. Retrieved 16 February 2025. Franzen, Carl (9 January 2025). "Microsoft's new rStar-Math technique upgrades small models to outperform OpenAI's o1-preview at math problems". VentureBeat. Retrieved 26 January 2025. Gina Genkina: New AI Model Advances the “Kissing Problem” and More. AlphaEvolve made several mathematical discoveries and practical optimizations IEEE Spectrum 2025-05-14. Retrieved 2025-06-07 Roberts, Siobhan (25 July 2024). "AI achieves silver-medal standard solving International Mathematical Olympiad problems". The New York Times. Archived from the original on 26 September 2024. Retrieved 7 August 2024. Azerbayev, Zhangir; Schoelkopf, Hailey; Paster, Keiran; Santos, Marco Dos; McAleer', Stephen; Jiang, Albert Q.; Deng, Jia; Biderman, Stella; Welleck, Sean (16 October 2023). "Llemma: An Open Language Model For Mathematics". EleutherAI Blog. Retrieved 26 January 2025. "Julius AI". julius.ai. Metz, Cade (21 July 2025). "Google A.I. System Wins Gold Medal in International Math Olympiad". The New York Times. ISSN 0362-4331. Retrieved 24 July 2025. McFarland, Alex (12 July 2024). "8 Best AI for Math Tools (January 2025)". Unite.AI. Retrieved 26 January 2025. Matthew Finio & Amanda Downie: IBM Think 2024 Primer, "What is Artificial Intelligence (AI) in Finance?" 8 Dec. 2023 M. Nicolas, J. Firzli: Pensions Age / European Pensions magazine, "Artificial Intelligence: Ask the Industry", May–June 2024. https://videovoice.org/ai-in-finance-innovation-entrepreneurship-vs-over-regulation-with-the-eus-artificial-intelligence-act-wont-work-as-intended/ Archived 11 September 2024 at the Wayback Machine. Congressional Research Service (2019). Artificial Intelligence and National Security (PDF). Washington, DC: Congressional Research Service. Archived (PDF) from the original on 8 May 2020. Retrieved 25 February 2024.PD-notice Slyusar, Vadym (2019). Artificial intelligence as the basis of future control networks (Preprint). doi:10.13140/RG.2.2.30247.50087. Iraqi, Amjad (3 April 2024). "'Lavender': The AI machine directing Israel's bombing spree in Gaza". +972 Magazine. Archived from the original on 10 October 2024. Retrieved 6 April 2024. Davies, Harry; McKernan, Bethan; Sabbagh, Dan (1 December 2023). "'The Gospel': how Israel uses AI to select bombing targets in Gaza". The Guardian. Archived from the original on 6 December 2023. Retrieved 4 December 2023. Marti, J Werner (10 August 2024). "Drohnen haben den Krieg in der Ukraine revolutioniert, doch sie sind empfindlich auf Störsender – deshalb sollen sie jetzt autonom operieren". Neue Zürcher Zeitung (in German). Archived from the original on 10 August 2024. Retrieved 10 August 2024. Newsom, Gavin; Weber, Shirley N. (5 September 2023). "Executive Order N-12-23" (PDF). Executive Department, State of California. Archived (PDF) from the original on 21 February 2024. Retrieved 7 September 2023. Pinaya, Walter H. L.; Graham, Mark S.; Kerfoot, Eric; Tudosiu, Petru-Daniel; Dafflon, Jessica; Fernandez, Virginia; Sanchez, Pedro; Wolleb, Julia; da Costa, Pedro F.; Patel, Ashay (2023). "Generative AI for Medical Imaging: extending the MONAI Framework". arXiv:2307.15208 [eess.IV]. "What is ChatGPT, DALL-E, and generative AI?". McKinsey. Archived from the original on 23 April 2023. Retrieved 14 December 2024. "What is generative AI?". IBM. 22 March 2024. Archived from the original on 13 December 2024. Retrieved 13 December 2024. Pasick, Adam (27 March 2023). "Artificial Intelligence Glossary: Neural Networks and Other Terms Explained". The New York Times. ISSN 0362-4331. Archived from the original on 1 September 2023. Retrieved 22 April 2023. Karpathy, Andrej; Abbeel, Pieter; Brockman, Greg; Chen, Peter; Cheung, Vicki; Duan, Yan; Goodfellow, Ian; Kingma, Durk; Ho, Jonathan; Rein Houthooft; Tim Salimans; John Schulman; Ilya Sutskever; Wojciech Zaremba (16 June 2016). "Generative models". OpenAI. Archived from the original on 17 November 2023. Retrieved 15 March 2023. Griffith, Erin; Metz, Cade (27 January 2023). "Anthropic Said to Be Closing In on $300 Million in New A.I. Funding". The New York Times. Archived from the original on 9 December 2023. Retrieved 14 March 2023. Lanxon, Nate; Bass, Dina; Davalos, Jackie (10 March 2023). "A Cheat Sheet to AI Buzzwords and Their Meanings". Bloomberg News. Archived from the original on 17 November 2023. Retrieved 14 March 2023. Metz, Cade (14 March 2023). "OpenAI Plans to Up the Ante in Tech's A.I. Race". The New York Times. ISSN 0362-4331. Archived from the original on 31 March 2023. Retrieved 31 March 2023. Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv (20 January 2022). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL]. Roose, Kevin (21 October 2022). "A Coming-Out Party for Generative A.I., Silicon Valley's New Craze". The New York Times. Archived from the original on 15 February 2023. Retrieved 14 March 2023. Metz, Cade (15 February 2024). "OpenAI Unveils A.I. That Instantly Generates Eye-Popping Videos". The New York Times. ISSN 0362-4331. Archived from the original on 15 February 2024. Retrieved 16 February 2024. Fink, Charlie. "LTX Video Breaks The 60-Second Barrier, Redefining AI Video As A Longform Medium". Forbes. Retrieved 24 July 2025. "The race of the AI labs heats up". The Economist. 30 January 2023. Archived from the original on 17 November 2023. Retrieved 14 March 2023. Yang, June; Gokturk, Burak (14 March 2023). "Google Cloud brings generative AI to developers, businesses, and governments". Archived from the original on 17 November 2023. Retrieved 15 March 2023. Taeihagh, Araz (4 April 2025). "Governance of Generative AI". Policy and Society. 44 (1): 1–22. doi:10.1093/polsoc/puaf001. ISSN 1449-4035. Simon, Felix M.; Altay, Sacha; Mercier, Hugo (18 October 2023). "Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown" (PDF). Harvard Kennedy School Misinformation Review. doi:10.37016/mr-2020-127. S2CID 264113883. Retrieved 16 November 2023. Hendrix, Justin (16 May 2023). "Transcript: Senate Judiciary Subcommittee Hearing on Oversight of AI". techpolicy.press. Archived from the original on 17 November 2023. Retrieved 19 May 2023. "New AI systems collide with copyright law". BBC News. 1 August 2023. Retrieved 28 September 2024. Poole, David; Mackworth, Alan (2023). Artificial Intelligence, Foundations of Computational Agents (3rd ed.). Cambridge University Press. doi:10.1017/9781009258227. ISBN 978-1-0092-5819-7. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Russell, Stuart; Norvig, Peter (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson. ISBN 978-0-1346-1099-3. "Why agents are the next frontier of generative AI". McKinsey Digital. 24 July 2024. Archived from the original on 3 October 2024. Retrieved 10 August 2024. Figueiredo, Mayara Costa; Ankrah, Elizabeth; Powell, Jacquelyn E.; Epstein, Daniel A.; Chen, Yunan (12 January 2024). "Powered by AI: Examining How AI Descriptions Influence Perceptions of Fertility Tracking Applications". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 7 (4): 1–24. doi:10.1145/3631414. Power, Jennifer; Pym, Tinonee; James, Alexandra; Waling, Andrea (5 July 2024). "Smart Sex Toys: A Narrative Review of Recent Research on Cultural, Health and Safety Considerations". Current Sexual Health Reports. 16 (3): 199–215. doi:10.1007/s11930-024-00392-3. ISSN 1548-3592. Marcantonio, Tiffany L.; Avery, Gracie; Thrash, Anna; Leone, Ruschelle M. (10 September 2024). "Large Language Models in an App: Conducting a Qualitative Synthetic Data Analysis of How Snapchat's "My AI" Responds to Questions About Sexual Consent, Sexual Refusals, Sexual Assault, and Sexting". The Journal of Sex Research: 1–15. doi:10.1080/00224499.2024.2396457. ISSN 0022-4499. PMC 11891083. PMID 39254628. Archived from the original on 9 December 2024. Retrieved 9 December 2024. Hanson, Kenneth R.; Bolthouse, Hannah (2024). ""Replika Removing Erotic Role-Play Is Like Grand Theft Auto Removing Guns or Cars": Reddit Discourse on Artificial Intelligence Chatbots and Sexual Technologies". Socius: Sociological Research for a Dynamic World. 10. doi:10.1177/23780231241259627. ISSN 2378-0231. Mania, Karolina (1 January 2024). "Legal Protection of Revenge and Deepfake Porn Victims in the European Union: Findings From a Comparative Legal Study". Trauma, Violence, & Abuse. 25 (1): 117–129. doi:10.1177/15248380221143772. ISSN 1524-8380. PMID 36565267. Singh, Suyesha; Nambiar, Vaishnavi (2024). "Role of Artificial Intelligence in the Prevention of Online Child Sexual Abuse: A Systematic Review of Literature". Journal of Applied Security Research. 19 (4): 586–627. doi:10.1080/19361610.2024.2331885. ISSN 1936-1610. Archived from the original on 9 December 2024. Retrieved 9 December 2024. Razi, Afsaneh; Kim, Seunghyun; Alsoubai, Ashwaq; Stringhini, Gianluca; Solorio, Thamar; De Choudhury, Munmun; Wisniewski, Pamela J. (13 October 2021). "A Human-Centered Systematic Literature Review of the Computational Approaches for Online Sexual Risk Detection". Proceedings of the ACM on Human-Computer Interaction. 5 (CSCW2): 1–38. doi:10.1145/3479609. ISSN 2573-0142. Archived from the original on 9 December 2024. Retrieved 9 December 2024. Ransbotham, Sam; Kiron, David; Gerbert, Philipp; Reeves, Martin (6 September 2017). "Reshaping Business With Artificial Intelligence". MIT Sloan Management Review. Archived from the original on 13 February 2024. Sun, Yuran; Zhao, Xilei; Lovreglio, Ruggiero; Kuligowski, Erica (1 January 2024), Naser, M. Z. (ed.), "8 – AI for large-scale evacuation modeling: promises and challenges", Interpretable Machine Learning for the Analysis, Design, Assessment, and Informed Decision Making for Civil Infrastructure, Woodhead Publishing Series in Civil and Structural Engineering, Woodhead Publishing, pp. 185–204, ISBN 978-0-1282-4073-1, archived from the original on 19 May 2024, retrieved 28 June 2024. Gomaa, Islam; Adelzadeh, Masoud; Gwynne, Steven; Spencer, Bruce; Ko, Yoon; Bénichou, Noureddine; Ma, Chunyun; Elsagan, Nour; Duong, Dana; Zalok, Ehab; Kinateder, Max (1 November 2021). "A Framework for Intelligent Fire Detection and Evacuation System". Fire Technology. 57 (6): 3179–3185. doi:10.1007/s10694-021-01157-3. ISSN 1572-8099. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Zhao, Xilei; Lovreglio, Ruggiero; Nilsson, Daniel (1 May 2020). "Modelling and interpreting pre-evacuation decision-making using machine learning". Automation in Construction. 113: 103140. doi:10.1016/j.autcon.2020.103140. hdl:10179/17315. ISSN 0926-5805. Archived from the original on 19 May 2024. Retrieved 5 October 2024. "India's latest election embraced AI technology. Here are some ways it was used constructively". PBS News. 12 June 2024. Archived from the original on 17 September 2024. Retrieved 28 October 2024. "Экономист Дарон Асемоглу написал книгу об угрозах искусственного интеллекта — и о том, как правильное управление может обратить его на пользу человечеству Спецкор "Медузы" Маргарита Лютова узнала у ученого, как скоро мир сможет приблизиться к этой утопии". Meduza (in Russian). Archived from the original on 20 June 2023. Retrieved 21 June 2023. "Learning, thinking, artistic collaboration and other such human endeavours in the age of AI". The Hindu. 2 June 2023. Archived from the original on 21 June 2023. Retrieved 21 June 2023. Müller, Vincent C. (30 April 2020). "Ethics of Artificial Intelligence and Robotics". Stanford Encyclopedia of Philosophy Archive. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Simonite (2016). Russell & Norvig (2021), p. 987. "Assessing potential future artificial intelligence risks, benefits and policy imperatives". OECD. 14 November 2024. Retrieved 1 August 2025. Laskowski (2023). GAO (2022). Valinsky (2019). Russell & Norvig (2021), p. 991. Russell & Norvig (2021), pp. 991–992. Christian (2020), p. 63. Vincent (2022). Kopel, Matthew. "Copyright Services: Fair Use". Cornell University Library. Archived from the original on 26 September 2024. Retrieved 26 April 2024. Burgess, Matt. "How to Stop Your Data From Being Used to Train AI". Wired. ISSN 1059-1028. Archived from the original on 3 October 2024. Retrieved 26 April 2024. Reisner (2023). Alter & Harris (2023). "Getting the Innovation Ecosystem Ready for AI. An IP policy toolkit" (PDF). WIPO. Hammond, George (27 December 2023). "Big Tech is spending more than VC firms on AI startups". Ars Technica. Archived from the original on 10 January 2024. Wong, Matteo (24 October 2023). "The Future of AI Is GOMA". The Atlantic. Archived from the original on 5 January 2024. "Big tech and the pursuit of AI dominance". The Economist. 26 March 2023. Archived from the original on 29 December 2023. Fung, Brian (19 December 2023). "Where the battle to dominate AI may be won". CNN Business. Archived from the original on 13 January 2024. Metz, Cade (5 July 2023). "In the Age of A.I., Tech's Little Guys Need Big Friends". The New York Times. Archived from the original on 8 July 2024. Retrieved 5 October 2024. "Electricity 2024 – Analysis". IEA. 24 January 2024. Retrieved 13 July 2024. Calvert, Brian (28 March 2024). "AI already uses as much energy as a small country. It's only the beginning". Vox. New York, New York. Archived from the original on 3 July 2024. Retrieved 5 October 2024. Halper, Evan; O'Donovan, Caroline (21 June 2024). "AI is exhausting the power grid. Tech firms are seeking a miracle solution". Washington Post. Davenport, Carly. "AI Data Centers and the Coming YS Power Demand Surge" (PDF). Goldman Sachs. Archived from the original (PDF) on 26 July 2024. Retrieved 5 October 2024. Ryan, Carol (12 April 2024). "Energy-Guzzling AI Is Also the Future of Energy Savings". Wall Street Journal. Dow Jones. Hiller, Jennifer (1 July 2024). "Tech Industry Wants to Lock Up Nuclear Power for AI". Wall Street Journal. Dow Jones. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Kendall, Tyler (28 September 2024). "Nvidia's Huang Says Nuclear Power an Option to Feed Data Centers". Bloomberg. Halper, Evan (20 September 2024). "Microsoft deal would reopen Three Mile Island nuclear plant to power AI". Washington Post. Hiller, Jennifer (20 September 2024). "Three Mile Island's Nuclear Plant to Reopen, Help Power Microsoft's AI Centers". Wall Street Journal. Dow Jones. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Niva Yadav (19 August 2024). "Taiwan to stop large data centers in the North, cites insufficient power". DatacenterDynamics. Archived from the original on 8 November 2024. Retrieved 7 November 2024. Mochizuki, Takashi; Oda, Shoko (18 October 2024). "エヌビディア出資の日本企業、原発近くでAIデータセンター新設検討". Bloomberg (in Japanese). Archived from the original on 8 November 2024. Retrieved 7 November 2024. Naureen S Malik and Will Wade (5 November 2024). "Nuclear-Hungry AI Campuses Need New Plan to Find Power Fast". Bloomberg. "Energy and AI Executive summary". International Energy Agency. Retrieved 10 April 2025. Nicas (2018). Rainie, Lee; Keeter, Scott; Perrin, Andrew (22 July 2019). "Trust and Distrust in America". Pew Research Center. Archived from the original on 22 February 2024. Kosoff, Maya (8 February 2018). "YouTube Struggles to Contain Its Conspiracy Problem". Vanity Fair. Retrieved 10 April 2025. Berry, David M. (19 March 2025). "Synthetic media and computational capitalism: towards a critical theory of artificial intelligence". AI & Society. doi:10.1007/s00146-025-02265-2. ISSN 1435-5655. "Unreal: A quantum leap in AI video". The Week. 17 June 2025. Retrieved 20 June 2025. Snow, Jackie. "AI video is getting real. Beware what comes next". Quartz. Retrieved 20 June 2025. Chow, Andrew R.; Perrigo, Billy (3 June 2025). "Google's New AI Tool Generates Convincing Deepfakes of Riots, Conflict, and Election Fraud". Time. Retrieved 20 June 2025. Williams (2023). Olanipekun, Samson Olufemi (2025). "Computational propaganda and misinformation: AI technologies as tools of media manipulation". World Journal of Advanced Research and Reviews. 25 (1): 911–923. doi:10.30574/wjarr.2025.25.1.0131. ISSN 2581-9615. Taylor & Hern (2023). "To fight AI, we need 'personhood credentials,' say AI firms". Archived from the original on 24 April 2025. Retrieved 9 May 2025. Samuel, Sigal (19 April 2022). "Why it's so damn hard to make AI fair and unbiased". Vox. Archived from the original on 5 October 2024. Retrieved 24 July 2024. Rose (2023). CNA (2019). Goffrey (2008), p. 17. Berdahl et al. (2023); Goffrey (2008, p. 17); Rose (2023); Russell & Norvig (2021, p. 995) Christian (2020), p. 25. Russell & Norvig (2021), p. 995. Grant & Hill (2023). Larson & Angwin (2016). Christian (2020), p. 67–70. Christian (2020, pp. 67–70); Russell & Norvig (2021, pp. 993–994) Russell & Norvig (2021, p. 995); Lipartito (2011, p. 36); Goodman & Flaxman (2017, p. 6); Christian (2020, pp. 39–40, 65) Quoted in Christian (2020, p. 65). Russell & Norvig (2021, p. 994); Christian (2020, pp. 40, 80–81) Quoted in Christian (2020, p. 80) Dockrill (2022). Sample (2017). "Black Box AI". 16 June 2023. Archived from the original on 15 June 2024. Retrieved 5 October 2024. Christian (2020), p. 110. Christian (2020), pp. 88–91. Christian (2020, p. 83); Russell & Norvig (2021, p. 997) Christian (2020), p. 91. Christian (2020), p. 83. Verma (2021). Rothman (2020). Christian (2020), pp. 105–108. Christian (2020), pp. 108–112. Ropek, Lucas (21 May 2024). "New Anthropic Research Sheds Light on AI's 'Black Box'". Gizmodo. Archived from the original on 5 October 2024. Retrieved 23 May 2024. Russell & Norvig (2021), p. 989. Russell & Norvig (2021), pp. 987–990. Russell & Norvig (2021), p. 988. Robitzski (2018); Sainato (2015) Harari (2018). Buckley, Chris; Mozur, Paul (22 May 2019). "How China Uses High-Tech Surveillance to Subdue Minorities". The New York Times. Archived from the original on 25 November 2019. Retrieved 2 July 2019. "Security lapse exposed a Chinese smart city surveillance system". 3 May 2019. Archived from the original on 7 March 2021. Retrieved 14 September 2020. Urbina et al. (2022). E. McGaughey, 'Will Robots Automate Your Job Away? Full Employment, Basic Income, and Economic Democracy' (2022), 51(3) Industrial Law Journal 511–559. Archived 27 May 2023 at the Wayback Machine. Ford & Colvin (2015);McGaughey (2022) IGM Chicago (2017). Arntz, Gregory & Zierahn (2016), p. 33. Lohr (2017); Frey & Osborne (2017); Arntz, Gregory & Zierahn (2016, p. 33) Zhou, Viola (11 April 2023). "AI is already taking video game illustrators' jobs in China". Rest of World. Archived from the original on 21 February 2024. Retrieved 17 August 2023. Carter, Justin (11 April 2023). "China's game art industry reportedly decimated by growing AI use". Game Developer. Archived from the original on 17 August 2023. Retrieved 17 August 2023. Morgenstern (2015). Mahdawi (2017); Thompson (2014) Tarnoff, Ben (4 August 2023). "Lessons from Eliza". The Guardian Weekly. pp. 34–39. Cellan-Jones (2014). Russell & Norvig 2021, p. 1001. Bostrom (2014). Russell (2019). Bostrom (2014); Müller & Bostrom (2014); Bostrom (2015). Harari (2023). Müller & Bostrom (2014). Leaders' concerns about the existential risks of AI around 2015: Rawlinson (2015), Holley (2015), Gibbs (2014), Sainato (2015) ""Godfather of artificial intelligence" talks impact and potential of new AI". CBS News. 25 March 2023. Archived from the original on 28 March 2023. Retrieved 28 March 2023. Pittis, Don (4 May 2023). "Canadian artificial intelligence leader Geoffrey Hinton piles on fears of computer takeover". CBC. Archived from the original on 7 July 2024. Retrieved 5 October 2024. "'50–50 chance' that AI outsmarts humanity, Geoffrey Hinton says". Bloomberg BNN. 14 June 2024. Archived from the original on 14 June 2024. Retrieved 6 July 2024. Valance (2023). Taylor, Josh (7 May 2023). "Rise of artificial intelligence is inevitable but should not be feared, 'father of AI' says". The Guardian. Archived from the original on 23 October 2023. Retrieved 26 May 2023. Colton, Emma (7 May 2023). "'Father of AI' says tech fears misplaced: 'You cannot stop it'". Fox News. Archived from the original on 26 May 2023. Retrieved 26 May 2023. Jones, Hessie (23 May 2023). "Juergen Schmidhuber, Renowned 'Father Of Modern AI,' Says His Life's Work Won't Lead To Dystopia". Forbes. Archived from the original on 26 May 2023. Retrieved 26 May 2023. McMorrow, Ryan (19 December 2023). "Andrew Ng: 'Do we think the world is better off with more or less intelligence?'". Financial Times. Archived from the original on 25 January 2024. Retrieved 30 December 2023. Levy, Steven (22 December 2023). "How Not to Be Stupid About AI, With Yann LeCun". Wired. Archived from the original on 28 December 2023. Retrieved 30 December 2023. Arguments that AI is not an imminent risk: Brooks (2014), Geist (2015), Madrigal (2015), Lee (2014) Christian (2020), pp. 67, 73. Yudkowsky (2008). Anderson & Anderson (2011). AAAI (2014). Wallach (2010). Russell (2019), p. 173. Stewart, Ashley; Melton, Monica. "Hugging Face CEO says he's focused on building a 'sustainable model' for the $4.5 billion open-source-AI startup". Business Insider. Archived from the original on 25 September 2024. Retrieved 14 April 2024. Wiggers, Kyle (9 April 2024). "Google open sources tools to support AI model development". TechCrunch. Archived from the original on 10 September 2024. Retrieved 14 April 2024. Heaven, Will Douglas (12 May 2023). "The open-source AI boom is built on Big Tech's handouts. How long will it last?". MIT Technology Review. Retrieved 14 April 2024. Brodsky, Sascha (19 December 2023). "Mistral AI's New Language Model Aims for Open Source Supremacy". AI Business. Archived from the original on 5 September 2024. Retrieved 5 October 2024. Edwards, Benj (22 February 2024). "Stability announces Stable Diffusion 3, a next-gen AI image generator". Ars Technica. Archived from the original on 5 October 2024. Retrieved 14 April 2024. Marshall, Matt (29 January 2024). "How enterprises are using open source LLMs: 16 examples". VentureBeat. Archived from the original on 26 September 2024. Retrieved 5 October 2024. Piper, Kelsey (2 February 2024). "Should we make our most powerful AI models open source to all?". Vox. Archived from the original on 5 October 2024. Retrieved 14 April 2024. Alan Turing Institute (2019). "Understanding artificial intelligence ethics and safety" (PDF). Archived (PDF) from the original on 11 September 2024. Retrieved 5 October 2024. Alan Turing Institute (2023). "AI Ethics and Governance in Practice" (PDF). Archived (PDF) from the original on 11 September 2024. Retrieved 5 October 2024. Floridi, Luciano; Cowls, Josh (23 June 2019). "A Unified Framework of Five Principles for AI in Society". Harvard Data Science Review. 1 (1). doi:10.1162/99608f92.8cd550d1. S2CID 198775713. Archived from the original on 7 August 2019. Retrieved 5 December 2023. Buruk, Banu; Ekmekci, Perihan Elif; Arda, Berna (1 September 2020). "A critical perspective on guidelines for responsible and trustworthy artificial intelligence". Medicine, Health Care and Philosophy. 23 (3): 387–399. doi:10.1007/s11019-020-09948-1. ISSN 1572-8633. PMID 32236794. S2CID 214766800. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Kamila, Manoj Kumar; Jasrotia, Sahil Singh (1 January 2023). "Ethical issues in the development of artificial intelligence: recognizing the risks". International Journal of Ethics and Systems. 41 (ahead-of-print): 45–63. doi:10.1108/IJOES-05-2023-0107. ISSN 2514-9369. S2CID 259614124. Archived from the original on 5 October 2024. Retrieved 5 October 2024. "AI Safety Institute releases new AI safety evaluations platform". UK Government. 10 May 2024. Archived from the original on 5 October 2024. Retrieved 14 May 2024. Regulation of AI to mitigate risks: Berryhill et al. (2019), Barfield & Pagallo (2018), Iphofen & Kritikos (2019), Wirtz, Weyerer & Geyer (2018), Buiten (2019) Law Library of Congress (U.S.). Global Legal Research Directorate (2019). Vincent (2023). Stanford University (2023). UNESCO (2021). Kissinger (2021). Altman, Brockman & Sutskever (2023). VOA News (25 October 2023). "UN Announces Advisory Body on Artificial Intelligence". Archived from the original on 18 September 2024. Retrieved 5 October 2024. "Council of Europe opens first ever global treaty on AI for signature". Council of Europe. 5 September 2024. Archived from the original on 17 September 2024. Retrieved 17 September 2024. Edwards (2023). Kasperowicz (2023). Fox News (2023). Milmo, Dan (3 November 2023). "Hope or Horror? The great AI debate dividing its pioneers". The Guardian Weekly. pp. 10–12. "The Bletchley Declaration by Countries Attending the AI Safety Summit, 1–2 November 2023". GOV.UK. 1 November 2023. Archived from the original on 1 November 2023. Retrieved 2 November 2023. "Countries agree to safe and responsible development of frontier AI in landmark Bletchley Declaration". GOV.UK (Press release). Archived from the original on 1 November 2023. Retrieved 1 November 2023. "Second global AI summit secures safety commitments from companies". Reuters. 21 May 2024. Retrieved 23 May 2024. "Frontier AI Safety Commitments, AI Seoul Summit 2024". gov.uk. 21 May 2024. Archived from the original on 23 May 2024. Retrieved 23 May 2024. Buntz, Brian (3 November 2024). "Quality vs. quantity: US and China chart different paths in global AI patent race in 2024 / Geographical breakdown of AI patents in 2024". R&D World. Archived from the original on 9 December 2024. Russell & Norvig 2021, p. 9. Copeland, J., ed. (2004). The Essential Turing: the ideas that gave birth to the computer age. Oxford, England: Clarendon Press. ISBN 0-1982-5079-7. "Google books ngram". Archived from the original on 5 October 2024. Retrieved 5 October 2024. AI's immediate precursors: McCorduck (2004, pp. 51–107), Crevier (1993, pp. 27–32), Russell & Norvig (2021, pp. 8–17), Moravec (1988, p. 3) Turing's original publication of the Turing test in "Computing machinery and intelligence": Turing (1950) Historical influence and philosophical implications: Haugeland (1985, pp. 6–9), Crevier (1993, p. 24), McCorduck (2004, pp. 70–71), Russell & Norvig (2021, pp. 2, 984) Crevier (1993), pp. 47–49. Russell & Norvig (2003), p. 17. Russell & Norvig (2003), p. 18. Newquist (1994), pp. 86–86. Simon (1965, p. 96) quoted in Crevier (1993, p. 109) Minsky (1967, p. 2) quoted in Crevier (1993, p. 109) Russell & Norvig (2021), p. 21. Lighthill (1973). NRC 1999, pp. 212–213. Russell & Norvig (2021), p. 22. Expert systems: Russell & Norvig (2021, pp. 23, 292), Luger & Stubblefield (2004, pp. 227–331), Nilsson (1998, chpt. 17.4), McCorduck (2004, pp. 327–335, 434–435), Crevier (1993, pp. 145–162, 197–203), Newquist (1994, pp. 155–183) Russell & Norvig (2021), p. 24. Nilsson (1998), p. 7. McCorduck (2004), pp. 454–462. Moravec (1988). Brooks (1990). Developmental robotics: Weng et al. (2001), Lungarella et al. (2003), Asada et al. (2009), Oudeyer (2010) Russell & Norvig (2021), p. 25. Crevier (1993, pp. 214–215), Russell & Norvig (2021, pp. 24, 26) Russell & Norvig (2021), p. 26. Formal and narrow methods adopted in the 1990s: Russell & Norvig (2021, pp. 24–26), McCorduck (2004, pp. 486–487) AI widely used in the late 1990s: Kurzweil (2005, p. 265), NRC (1999, pp. 216–222), Newquist (1994, pp. 189–201) Wong (2023). Moore's Law and AI: Russell & Norvig (2021, pp. 14, 27) Clark (2015b). Big data: Russell & Norvig (2021, p. 26) Sagar, Ram (3 June 2020). "OpenAI Releases GPT-3, The Largest Model So Far". Analytics India Magazine. Archived from the original on 4 August 2020. Retrieved 15 March 2023. Milmo, Dan (2 February 2023). "ChatGPT reaches 100 million users two months after launch". The Guardian. ISSN 0261-3077. Archived from the original on 3 February 2023. Retrieved 31 December 2024. Gorichanaz, Tim (29 November 2023). "ChatGPT turns 1: AI chatbot's success says as much about humans as technology". The Conversation. Archived from the original on 31 December 2024. Retrieved 31 December 2024. DiFeliciantonio (2023). Goswami (2023). "Nearly 1 in 4 new startups is an AI company". PitchBook. 24 December 2024. Retrieved 3 January 2025. Grayling, Anthony; Ball, Brian (1 August 2024). "Philosophy is crucial in the age of AI". The Conversation. Archived from the original on 5 October 2024. Retrieved 4 October 2024. Jarow, Oshan (15 June 2024). "Will AI ever become conscious? It depends on how you think about biology". Vox. Archived from the original on 21 September 2024. Retrieved 4 October 2024. McCarthy, John. "The Philosophy of AI and the AI of Philosophy". jmc.stanford.edu. Archived from the original on 23 October 2018. Retrieved 3 October 2024. Turing (1950), p. 1. Turing (1950), Under "The Argument from Consciousness". Kirk-Giannini, Cameron Domenico; Goldstein, Simon (16 October 2023). "AI is closer than ever to passing the Turing test for 'intelligence'. What happens when it does?". The Conversation. Archived from the original on 25 September 2024. Retrieved 17 August 2024. Russell & Norvig (2021), p. 3. Maker (2006). McCarthy (1999). Minsky (1986). "What Is Artificial Intelligence (AI)?". Google Cloud Platform. Archived from the original on 31 July 2023. Retrieved 16 October 2023. "One of the Biggest Problems in Regulating AI Is Agreeing on a Definition". Carnegie Endowment for International Peace. Retrieved 31 July 2024. "AI or BS? How to tell if a marketing tool really uses artificial intelligence". The Drum. Retrieved 31 July 2024. Musser, George (1 September 2023). "How AI Knows Things No One Told It". Scientific American. Retrieved 17 July 2025. Nilsson (1983), p. 10. Haugeland (1985), pp. 112–117. Physical symbol system hypothesis: Newell & Simon (1976, p. 116) Historical significance: McCorduck (2004, p. 153), Russell & Norvig (2021, p. 19) Moravec's paradox: Moravec (1988, pp. 15–16), Minsky (1986, p. 29), Pinker (2007, pp. 190–191) Dreyfus' critique of AI: Dreyfus (1972), Dreyfus & Dreyfus (1986) Historical significance and philosophical implications: Crevier (1993, pp. 120–132), McCorduck (2004, pp. 211–239), Russell & Norvig (2021, pp. 981–982), Fearn (2007, chpt. 3) Crevier (1993), p. 125. Langley (2011). Katz (2012). Neats vs. scruffies, the historic debate: McCorduck (2004, pp. 421–424, 486–489), Crevier (1993, p. 168), Nilsson (1983, pp. 10–11), Russell & Norvig (2021, p. 24) A classic example of the "scruffy" approach to intelligence: Minsky (1986) A modern example of neat AI and its aspirations in the 21st century: Domingos (2015) Pennachin & Goertzel (2007). Roberts (2016). Russell & Norvig (2021), p. 986. Chalmers (1995). Dennett (1991). Horst (2005). Searle (1999). Searle (1980), p. 1. Russell & Norvig (2021), p. 9817. Searle's Chinese room argument: Searle (1980). Searle's original presentation of the thought experiment., Searle (1999). Discussion: Russell & Norvig (2021, pp. 985), McCorduck (2004, pp. 443–445), Crevier (1993, pp. 269–271) Leith, Sam (7 July 2022). "Nick Bostrom: How can we be certain a machine isn't conscious?". The Spectator. Archived from the original on 26 September 2024. Retrieved 23 February 2024. Thomson, Jonny (31 October 2022). "Why don't robots have rights?". Big Think. Archived from the original on 13 September 2024. Retrieved 23 February 2024. Kateman, Brian (24 July 2023). "AI Should Be Terrified of Humans". Time. Archived from the original on 25 September 2024. Retrieved 23 February 2024. Wong, Jeff (10 July 2023). "What leaders need to know about robot rights". Fast Company. Hern, Alex (12 January 2017). "Give robots 'personhood' status, EU committee argues". The Guardian. ISSN 0261-3077. Archived from the original on 5 October 2024. Retrieved 23 February 2024. Dovey, Dana (14 April 2018). "Experts Don't Think Robots Should Have Rights". Newsweek. Archived from the original on 5 October 2024. Retrieved 23 February 2024. Cuddy, Alice (13 April 2018). "Robot rights violate human rights, experts warn EU". euronews. Archived from the original on 19 September 2024. Retrieved 23 February 2024. The Intelligence explosion and technological singularity: Russell & Norvig (2021, pp. 1004–1005), Omohundro (2008), Kurzweil (2005) I. J. Good's "intelligence explosion": Good (1965) Vernor Vinge's "singularity": Vinge (1993) Russell & Norvig (2021), p. 1005. Transhumanism: Moravec (1988), Kurzweil (2005), Russell & Norvig (2021, p. 1005) AI as evolution: Edward Fredkin is quoted in McCorduck (2004, p. 401), Butler (1863), Dyson (1998) AI in myth: McCorduck (2004, pp. 4–5) McCorduck (2004), pp. 340–400. Buttazzo (2001). Anderson (2008). McCauley (2007). Galvan (1997). AI textbooks The two most widely used textbooks in 2023 (see the Open Syllabus): Russell, Stuart J.; Norvig, Peter (2021). Artificial Intelligence: A Modern Approach (4th ed.). Hoboken: Pearson. ISBN 978-0-1346-1099-3. LCCN 20190474. Rich, Elaine; Knight, Kevin; Nair, Shivashankar B (2010). Artificial Intelligence (3rd ed.). New Delhi: Tata McGraw Hill India. ISBN 978-0-0700-8770-5. The four most widely used AI textbooks in 2008: Luger, George; Stubblefield, William (2004). Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th ed.). Benjamin/Cummings. ISBN 978-0-8053-4780-7. Archived from the original on 26 July 2020. Retrieved 17 December 2019. Nilsson, Nils (1998). Artificial Intelligence: A New Synthesis. Morgan Kaufmann. ISBN 978-1-5586-0467-4. Archived from the original on 26 July 2020. Retrieved 18 November 2019. Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, ISBN 0-13-790395-2. Poole, David; Mackworth, Alan; Goebel, Randy (1998). Computational Intelligence: A Logical Approach. New York: Oxford University Press. ISBN 978-0-1951-0270-3. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Later edition: Poole, David; Mackworth, Alan (2017). Artificial Intelligence: Foundations of Computational Agents (2nd ed.). Cambridge University Press. ISBN 978-1-1071-9539-4. Archived from the original on 7 December 2017. Retrieved 6 December 2017. Other textbooks: Ertel, Wolfgang (2017). Introduction to Artificial Intelligence (2nd ed.). Springer. ISBN 978-3-3195-8486-7. Ciaramella, Alberto; Ciaramella, Marco (2024). Introduction to Artificial Intelligence: from data analysis to generative AI (1st ed.). Intellisemantic Editions. ISBN 978-8-8947-8760-3. History of AI Crevier, Daniel (1993). AI: The Tumultuous Search for Artificial Intelligence. New York, NY: BasicBooks. ISBN 0-465-02997-3. McCorduck, Pamela (2004), Machines Who Think (2nd ed.), Natick, Massachusetts: A. K. Peters, ISBN 1-5688-1205-1 Newquist, H. P. (1994). The Brain Makers: Genius, Ego, And Greed In The Quest For Machines That Think. New York: Macmillan/SAMS. ISBN 978-0-6723-0412-5. Harmon, Paul; Sawyer, Brian (1990). Creating Expert Systems for Business and Industry. New York: John Wiley & Sons. ISBN 0471614963. Other sources AI & ML in Fusion AI & ML in Fusion, video lecture Archived 2 July 2023 at the Wayback Machine Alter, Alexandra; Harris, Elizabeth A. (20 September 2023), "Franzen, Grisham and Other Prominent Authors Sue OpenAI", The New York Times, archived from the original on 14 September 2024, retrieved 5 October 2024 Altman, Sam; Brockman, Greg; Sutskever, Ilya (22 May 2023). "Governance of Superintelligence". openai.com. Archived from the original on 27 May 2023. Retrieved 27 May 2023. Anderson, Susan Leigh (2008). "Asimov's "three laws of robotics" and machine metaethics". AI & Society. 22 (4): 477–493. doi:10.1007/s00146-007-0094-5. S2CID 1809459. Anderson, Michael; Anderson, Susan Leigh (2011). Machine Ethics. Cambridge University Press. Arntz, Melanie; Gregory, Terry; Zierahn, Ulrich (2016), "The risk of automation for jobs in OECD countries: A comparative analysis", OECD Social, Employment, and Migration Working Papers 189 Asada, M.; Hosoda, K.; Kuniyoshi, Y.; Ishiguro, H.; Inui, T.; Yoshikawa, Y.; Ogino, M.; Yoshida, C. (2009). "Cognitive developmental robotics: a survey". IEEE Transactions on Autonomous Mental Development. 1 (1): 12–34. doi:10.1109/tamd.2009.2021702. S2CID 10168773. "Ask the AI experts: What's driving today's progress in AI?". McKinsey & Company. Archived from the original on 13 April 2018. Retrieved 13 April 2018. Barfield, Woodrow; Pagallo, Ugo (2018). Research handbook on the law of artificial intelligence. Cheltenham, UK: Edward Elgar Publishing. ISBN 978-1-7864-3904-8. OCLC 1039480085. Beal, J.; Winston, Patrick (2009), "The New Frontier of Human-Level Artificial Intelligence", IEEE Intelligent Systems, vol. 24, pp. 21–24, doi:10.1109/MIS.2009.75, hdl:1721.1/52357, S2CID 32437713 Berdahl, Carl Thomas; Baker, Lawrence; Mann, Sean; Osoba, Osonde; Girosi, Federico (7 February 2023). "Strategies to Improve the Impact of Artificial Intelligence on Health Equity: Scoping Review". JMIR AI. 2: e42936. doi:10.2196/42936. ISSN 2817-1705. PMC 11041459. PMID 38875587. S2CID 256681439. Berryhill, Jamie; Heang, Kévin Kok; Clogher, Rob; McBride, Keegan (2019). Hello, World: Artificial Intelligence and its Use in the Public Sector (PDF). Paris: OECD Observatory of Public Sector Innovation. Archived (PDF) from the original on 20 December 2019. Retrieved 9 August 2020. Bertini, M; Del Bimbo, A; Torniai, C (2006). "Automatic annotation and semantic retrieval of video sequences using multimedia ontologies". MM '06 Proceedings of the 14th ACM international conference on Multimedia. 14th ACM international conference on Multimedia. Santa Barbara: ACM. pp. 679–682. Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press. Bostrom, Nick (2015). "What happens when our computers get smarter than we are?". TED (conference). Archived from the original on 25 July 2020. Retrieved 30 January 2020. Brooks, Rodney (10 November 2014). "artificial intelligence is a tool, not a threat". Archived from the original on 12 November 2014. Brooks, Rodney (1990). "Elephants Don't Play Chess" (PDF). Robotics and Autonomous Systems. 6 (1–2): 3–15. CiteSeerX 10.1.1.588.7539. doi:10.1016/S0921-8890(05)80025-9. Archived (PDF) from the original on 9 August 2007. Buiten, Miriam C (2019). "Towards Intelligent Regulation of Artificial Intelligence". European Journal of Risk Regulation. 10 (1): 41–59. doi:10.1017/err.2019.8. ISSN 1867-299X. Bushwick, Sophie (16 March 2023), "What the New GPT-4 AI Can Do", Scientific American, archived from the original on 22 August 2023, retrieved 5 October 2024 Butler, Samuel (13 June 1863). "Darwin among the Machines". Letters to the Editor. The Press. Christchurch, New Zealand. Archived from the original on 19 September 2008. Retrieved 16 October 2014 – via Victoria University of Wellington. Buttazzo, G. (July 2001). "Artificial consciousness: Utopia or real possibility?". Computer. 34 (7): 24–30. doi:10.1109/2.933500. Cambria, Erik; White, Bebo (May 2014). "Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article]". IEEE Computational Intelligence Magazine. 9 (2): 48–57. doi:10.1109/MCI.2014.2307227. S2CID 206451986. Cellan-Jones, Rory (2 December 2014). "Stephen Hawking warns artificial intelligence could end mankind". BBC News. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Chalmers, David (1995). "Facing up to the problem of consciousness". Journal of Consciousness Studies. 2 (3): 200–219. CiteSeerX 10.1.1.103.8362. Archived from the original on 8 March 2005. Retrieved 11 October 2018. Challa, Subhash; Moreland, Mark R.; Mušicki, Darko; Evans, Robin J. (2011). Fundamentals of Object Tracking. Cambridge University Press. doi:10.1017/CBO9780511975837. ISBN 978-0-5218-7628-5. Christian, Brian (2020). The Alignment Problem: Machine learning and human values. W. W. Norton & Company. ISBN 978-0-3938-6833-3. OCLC 1233266753. Ciresan, D.; Meier, U.; Schmidhuber, J. (2012). "Multi-column deep neural networks for image classification". 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649. arXiv:1202.2745. doi:10.1109/cvpr.2012.6248110. ISBN 978-1-4673-1228-8. S2CID 2161592. Clark, Jack (2015b). "Why 2015 Was a Breakthrough Year in Artificial Intelligence". Bloomberg.com. Archived from the original on 23 November 2016. Retrieved 23 November 2016. CNA (12 January 2019). "Commentary: Bad news. Artificial intelligence is biased". CNA. Archived from the original on 12 January 2019. Retrieved 19 June 2020. Cybenko, G. (1988). Continuous valued neural networks with two hidden layers are sufficient (Report). Department of Computer Science, Tufts University. Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications" (PDF). Foundations and Trends in Signal Processing. 7 (3–4): 197–387. doi:10.1561/2000000039. Archived (PDF) from the original on 14 March 2016. Retrieved 18 October 2014. Dennett, Daniel (1991). Consciousness Explained. The Penguin Press. ISBN 978-0-7139-9037-9. DiFeliciantonio, Chase (3 April 2023). "AI has already changed the world. This report shows how". San Francisco Chronicle. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Dickson, Ben (2 May 2022). "Machine learning: What is the transformer architecture?". TechTalks. Archived from the original on 22 November 2023. Retrieved 22 November 2023. Dockrill, Peter (27 June 2022), "Robots With Flawed AI Make Sexist And Racist Decisions, Experiment Shows", Science Alert, archived from the original on 27 June 2022 Domingos, Pedro (2015). The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books. ISBN 978-0-4650-6570-7. Dreyfus, Hubert (1972). What Computers Can't Do. New York: MIT Press. ISBN 978-0-0601-1082-6. Dreyfus, Hubert; Dreyfus, Stuart (1986). Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer. Oxford: Blackwell. ISBN 978-0-0290-8060-3. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Dyson, George (1998). Darwin among the Machines. Allan Lane Science. ISBN 978-0-7382-0030-9. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Edelson, Edward (1991). The Nervous System. New York: Chelsea House. ISBN 978-0-7910-0464-7. Archived from the original on 26 July 2020. Retrieved 18 November 2019. Edwards, Benj (17 May 2023). "Poll: AI poses risk to humanity, according to majority of Americans". Ars Technica. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Fearn, Nicholas (2007). The Latest Answers to the Oldest Questions: A Philosophical Adventure with the World's Greatest Thinkers. New York: Grove Press. ISBN 978-0-8021-1839-4. Ford, Martin; Colvin, Geoff (6 September 2015). "Will robots create more jobs than they destroy?". The Guardian. Archived from the original on 16 June 2018. Retrieved 13 January 2018. Fox News (2023). "Fox News Poll" (PDF). Fox News. Archived (PDF) from the original on 12 May 2023. Retrieved 19 June 2023. Frey, Carl Benedikt; Osborne, Michael A (1 January 2017). "The future of employment: How susceptible are jobs to computerisation?". Technological Forecasting and Social Change. 114: 254–280. CiteSeerX 10.1.1.395.416. doi:10.1016/j.techfore.2016.08.019. ISSN 0040-1625. "From not working to neural networking". The Economist. 2016. Archived from the original on 31 December 2016. Retrieved 26 April 2018. Galvan, Jill (1 January 1997). "Entering the Posthuman Collective in Philip K. Dick's "Do Androids Dream of Electric Sheep?"". Science Fiction Studies. 24 (3): 413–429. doi:10.1525/sfs.24.3.0413. JSTOR 4240644. Geist, Edward Moore (9 August 2015). "Is artificial intelligence really an existential threat to humanity?". Bulletin of the Atomic Scientists. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Gibbs, Samuel (27 October 2014). "Elon Musk: artificial intelligence is our biggest existential threat". The Guardian. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Goffrey, Andrew (2008). "Algorithm". In Fuller, Matthew (ed.). Software studies: a lexicon. Cambridge, Mass.: MIT Press. pp. 15–20. ISBN 978-1-4356-4787-9. Goldman, Sharon (14 September 2022). "10 years later, deep learning 'revolution' rages on, say AI pioneers Hinton, LeCun and Li". VentureBeat. Archived from the original on 5 October 2024. Retrieved 8 December 2023. Good, I. J. (1965), Speculations Concerning the First Ultraintelligent Machine, archived from the original on 10 July 2023, retrieved 5 October 2024 Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016), Deep Learning, MIT Press., archived from the original on 16 April 2016, retrieved 12 November 2017 Goodman, Bryce; Flaxman, Seth (2017). "EU regulations on algorithmic decision-making and a 'right to explanation'". AI Magazine. 38 (3): 50. arXiv:1606.08813. doi:10.1609/aimag.v38i3.2741. S2CID 7373959. Government Accountability Office (13 September 2022). Consumer Data: Increasing Use Poses Risks to Privacy. gao.gov (Report). Archived from the original on 13 September 2024. Retrieved 5 October 2024. Grant, Nico; Hill, Kashmir (22 May 2023). "Google's Photo App Still Can't Find Gorillas. And Neither Can Apple's". The New York Times. Archived from the original on 14 September 2024. Retrieved 5 October 2024. Goswami, Rohan (5 April 2023). "Here's where the A.I. jobs are". CNBC. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Harari, Yuval Noah (October 2018). "Why Technology Favors Tyranny". The Atlantic. Archived from the original on 25 September 2021. Retrieved 23 September 2021. Harari, Yuval Noah (2023). "AI and the future of humanity". YouTube. Archived from the original on 30 September 2024. Retrieved 5 October 2024. Haugeland, John (1985). Artificial Intelligence: The Very Idea. Cambridge, Mass.: MIT Press. ISBN 978-0-2620-8153-5. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; Kingsbury, B. (2012). "Deep Neural Networks for Acoustic Modeling in Speech Recognition – The shared views of four research groups". IEEE Signal Processing Magazine. 29 (6): 82–97. Bibcode:2012ISPM...29...82H. doi:10.1109/msp.2012.2205597. S2CID 206485943. Holley, Peter (28 January 2015). "Bill Gates on dangers of artificial intelligence: 'I don't understand why some people are not concerned'". The Washington Post. ISSN 0190-8286. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert (1989). Multilayer Feedforward Networks are Universal Approximators (PDF). Neural Networks. Vol. 2. Pergamon Press. pp. 359–366. Archived (PDF) from the original on 21 April 2023. Retrieved 5 October 2024. Horst, Steven (2005). "The Computational Theory of Mind". The Stanford Encyclopedia of Philosophy. Archived from the original on 6 March 2016. Retrieved 7 March 2016. Howe, J. (November 1994). "Artificial Intelligence at Edinburgh University: a Perspective". Archived from the original on 15 May 2007. Retrieved 30 August 2007. IGM Chicago (30 June 2017). "Robots and Artificial Intelligence". igmchicago.org. Archived from the original on 1 May 2019. Retrieved 3 July 2019. Iphofen, Ron; Kritikos, Mihalis (3 January 2019). "Regulating artificial intelligence and robotics: ethics by design in a digital society". Contemporary Social Science. 16 (2): 170–184. doi:10.1080/21582041.2018.1563803. ISSN 2158-2041. S2CID 59298502. Jordan, M. I.; Mitchell, T. M. (16 July 2015). "Machine learning: Trends, perspectives, and prospects". Science. 349 (6245): 255–260. Bibcode:2015Sci...349..255J. doi:10.1126/science.aaa8415. PMID 26185243. S2CID 677218. Kahneman, Daniel (2011). Thinking, Fast and Slow. Macmillan. ISBN 978-1-4299-6935-2. Archived from the original on 15 March 2023. Retrieved 8 April 2012. Kahneman, Daniel; Slovic, D.; Tversky, Amos (1982). "Judgment under uncertainty: Heuristics and biases". Science. 185 (4157). New York: Cambridge University Press: 1124–1131. Bibcode:1974Sci...185.1124T. doi:10.1126/science.185.4157.1124. ISBN 978-0-5212-8414-1. PMID 17835457. S2CID 143452957. Kasperowicz, Peter (1 May 2023). "Regulate AI? GOP much more skeptical than Dems that government can do it right: poll". Fox News. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Katz, Yarden (1 November 2012). "Noam Chomsky on Where Artificial Intelligence Went Wrong". The Atlantic. Archived from the original on 28 February 2019. Retrieved 26 October 2014. "Kismet". MIT Artificial Intelligence Laboratory, Humanoid Robotics Group. Archived from the original on 17 October 2014. Retrieved 25 October 2014. Kissinger, Henry (1 November 2021). "The Challenge of Being Human in the Age of AI". The Wall Street Journal. Archived from the original on 4 November 2021. Retrieved 4 November 2021. Kobielus, James (27 November 2019). "GPUs Continue to Dominate the AI Accelerator Market for Now". InformationWeek. Archived from the original on 19 October 2021. Retrieved 11 June 2020. Kuperman, G. J.; Reichley, R. M.; Bailey, T. C. (1 July 2006). "Using Commercial Knowledge Bases for Clinical Decision Support: Opportunities, Hurdles, and Recommendations". Journal of the American Medical Informatics Association. 13 (4): 369–371. doi:10.1197/jamia.M2055. PMC 1513681. PMID 16622160. Kurzweil, Ray (2005). The Singularity is Near. Penguin Books. ISBN 978-0-6700-3384-3. Langley, Pat (2011). "The changing science of machine learning". Machine Learning. 82 (3): 275–279. doi:10.1007/s10994-011-5242-y. Larson, Jeff; Angwin, Julia (23 May 2016). "How We Analyzed the COMPAS Recidivism Algorithm". ProPublica. Archived from the original on 29 April 2019. Retrieved 19 June 2020. Laskowski, Nicole (November 2023). "What is Artificial Intelligence and How Does AI Work? TechTarget". Enterprise AI. Archived from the original on 5 October 2024. Retrieved 30 October 2023. Law Library of Congress (U.S.). Global Legal Research Directorate, issuing body. (2019). Regulation of artificial intelligence in selected jurisdictions. LCCN 2019668143. OCLC 1110727808. Lee, Timothy B. (22 August 2014). "Will artificial intelligence destroy humanity? Here are 5 reasons not to worry". Vox. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Lenat, Douglas; Guha, R. V. (1989). Building Large Knowledge-Based Systems. Addison-Wesley. ISBN 978-0-2015-1752-1. Lighthill, James (1973). "Artificial Intelligence: A General Survey". Artificial Intelligence: a paper symposium. Science Research Council. Lipartito, Kenneth (6 January 2011), The Narrative and the Algorithm: Genres of Credit Reporting from the Nineteenth Century to Today (PDF) (Unpublished manuscript), doi:10.2139/ssrn.1736283, S2CID 166742927, archived (PDF) from the original on 9 October 2022 Lohr, Steve (2017). "Robots Will Take Jobs, but Not as Fast as Some Fear, New Report Says". The New York Times. Archived from the original on 14 January 2018. Retrieved 13 January 2018. Lungarella, M.; Metta, G.; Pfeifer, R.; Sandini, G. (2003). "Developmental robotics: a survey". Connection Science. 15 (4): 151–190. Bibcode:2003ConSc..15..151L. CiteSeerX 10.1.1.83.7615. doi:10.1080/09540090310001655110. S2CID 1452734. "Machine Ethics". aaai.org. Archived from the original on 29 November 2014. Madrigal, Alexis C. (27 February 2015). "The case against killer robots, from a guy actually working on artificial intelligence". Fusion.net. Archived from the original on 4 February 2016. Retrieved 31 January 2016. Mahdawi, Arwa (26 June 2017). "What jobs will still be around in 20 years? Read this to prepare your future". The Guardian. Archived from the original on 14 January 2018. Retrieved 13 January 2018. Maker, Meg Houston (2006), AI@50: AI Past, Present, Future, Dartmouth College, archived from the original on 8 October 2008, retrieved 16 October 2008 Marmouyet, Françoise (15 December 2023). "Google's Gemini: is the new AI model really better than ChatGPT?". The Conversation. Archived from the original on 4 March 2024. Retrieved 25 December 2023. Minsky, Marvin (1986), The Society of Mind, Simon and Schuster McCarthy, John; Minsky, Marvin; Rochester, Nathan; Shannon, Claude (1955). "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence". Archived from the original on 26 August 2007. Retrieved 30 August 2007. McCarthy, John (2007), "From Here to Human-Level AI", Artificial Intelligence, p. 171 McCarthy, John (1999), What is AI?, archived from the original on 4 December 2022, retrieved 4 December 2022 McCauley, Lee (2007). "AI armageddon and the three laws of robotics". Ethics and Information Technology. 9 (2): 153–164. CiteSeerX 10.1.1.85.8904. doi:10.1007/s10676-007-9138-2. S2CID 37272949. McGarry, Ken (1 December 2005). "A survey of interestingness measures for knowledge discovery". The Knowledge Engineering Review. 20 (1): 39–61. doi:10.1017/S0269888905000408. S2CID 14987656. McGaughey, E (2022), Will Robots Automate Your Job Away? Full Employment, Basic Income, and Economic Democracy, p. 51(3) Industrial Law Journal 511–559, doi:10.2139/ssrn.3044448, S2CID 219336439, SSRN 3044448, archived from the original on 31 January 2021, retrieved 27 May 2023 Merkle, Daniel; Middendorf, Martin (2013). "Swarm Intelligence". In Burke, Edmund K.; Kendall, Graham (eds.). Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques. Springer Science & Business Media. ISBN 978-1-4614-6940-7. Minsky, Marvin (1967), Computation: Finite and Infinite Machines, Englewood Cliffs, N.J.: Prentice-Hall Moravec, Hans (1988). Mind Children. Harvard University Press. ISBN 978-0-6745-7616-2. Archived from the original on 26 July 2020. Retrieved 18 November 2019. Morgenstern, Michael (9 May 2015). "Automation and anxiety". The Economist. Archived from the original on 12 January 2018. Retrieved 13 January 2018. Müller, Vincent C.; Bostrom, Nick (2014). "Future Progress in Artificial Intelligence: A Poll Among Experts" (PDF). AI Matters. 1 (1): 9–11. doi:10.1145/2639475.2639478. S2CID 8510016. Archived (PDF) from the original on 15 January 2016. Neumann, Bernd; Möller, Ralf (January 2008). "On scene interpretation with description logics". Image and Vision Computing. 26 (1): 82–101. doi:10.1016/j.imavis.2007.08.013. S2CID 10767011. Nilsson, Nils (1995), "Eyes on the Prize", AI Magazine, vol. 16, pp. 9–17 Newell, Allen; Simon, H. A. (1976). "Computer Science as Empirical Inquiry: Symbols and Search". Communications of the ACM. 19 (3): 113–126. doi:10.1145/360018.360022. Nicas, Jack (7 February 2018). "How YouTube Drives People to the Internet's Darkest Corners". The Wall Street Journal. ISSN 0099-9660. Archived from the original on 5 October 2024. Retrieved 16 June 2018. Nilsson, Nils (1983). "Artificial Intelligence Prepares for 2001" (PDF). AI Magazine. 1 (1). Archived (PDF) from the original on 17 August 2020. Retrieved 22 August 2020. Presidential Address to the Association for the Advancement of Artificial Intelligence. NRC (United States National Research Council) (1999). "Developments in Artificial Intelligence". Funding a Revolution: Government Support for Computing Research. National Academy Press. Omohundro, Steve (2008). The Nature of Self-Improving Artificial Intelligence. presented and distributed at the 2007 Singularity Summit, San Francisco, CA. Oudeyer, P-Y. (2010). "On the impact of robotics in behavioral and cognitive sciences: from insect navigation to human cognitive development" (PDF). IEEE Transactions on Autonomous Mental Development. 2 (1): 2–16. doi:10.1109/tamd.2009.2039057. S2CID 6362217. Archived (PDF) from the original on 3 October 2018. Retrieved 4 June 2013. Pennachin, C.; Goertzel, B. (2007). "Contemporary Approaches to Artificial General Intelligence". Artificial General Intelligence. Cognitive Technologies. Berlin, Heidelberg: Springer. pp. 1–30. doi:10.1007/978-3-540-68677-4_1. ISBN 978-3-5402-3733-4. Pinker, Steven (2007) [1994], The Language Instinct, Perennial Modern Classics, Harper, ISBN 978-0-0613-3646-1 Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. Archived from the original on 23 March 2023. Retrieved 27 April 2021. Rawlinson, Kevin (29 January 2015). "Microsoft's Bill Gates insists AI is a threat". BBC News. Archived from the original on 29 January 2015. Retrieved 30 January 2015. Reisner, Alex (19 August 2023), "Revealed: The Authors Whose Pirated Books are Powering Generative AI", The Atlantic, archived from the original on 3 October 2024, retrieved 5 October 2024 Roberts, Jacob (2016). "Thinking Machines: The Search for Artificial Intelligence". Distillations. Vol. 2, no. 2. pp. 14–23. Archived from the original on 19 August 2018. Retrieved 20 March 2018. Robitzski, Dan (5 September 2018). "Five experts share what scares them the most about AI". Archived from the original on 8 December 2019. Retrieved 8 December 2019. Rose, Steve (11 July 2023). "AI Utopia or dystopia?". The Guardian Weekly. pp. 42–43. Russell, Stuart (2019). Human Compatible: Artificial Intelligence and the Problem of Control. United States: Viking. ISBN 978-0-5255-5861-3. OCLC 1083694322. Sainato, Michael (19 August 2015). "Stephen Hawking, Elon Musk, and Bill Gates Warn About Artificial Intelligence". Observer. Archived from the original on 30 October 2015. Retrieved 30 October 2015. Sample, Ian (5 November 2017). "Computer says no: why making AIs fair, accountable and transparent is crucial". The Guardian. Archived from the original on 10 October 2022. Retrieved 30 January 2018. Rothman, Denis (7 October 2020). "Exploring LIME Explanations and the Mathematics Behind It". Codemotion. Archived from the original on 25 November 2023. Retrieved 25 November 2023. Scassellati, Brian (2002). "Theory of mind for a humanoid robot". Autonomous Robots. 12 (1): 13–24. doi:10.1023/A:1013298507114. S2CID 1979315. Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117. arXiv:1404.7828. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509. Schmidhuber, Jürgen (2022). "Annotated History of Modern AI and Deep Learning". Archived from the original on 7 August 2023. Retrieved 5 October 2024. Searle, John (1980). "Minds, Brains and Programs" (PDF). Behavioral and Brain Sciences. 3 (3): 417–457. doi:10.1017/S0140525X00005756. S2CID 55303721. Archived (PDF) from the original on 17 March 2019. Retrieved 22 August 2020. Searle, John (1999). Mind, language and society. New York: Basic Books. ISBN 978-0-4650-4521-1. OCLC 231867665. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Simon, H. A. (1965), The Shape of Automation for Men and Management, New York: Harper & Row Simonite, Tom (31 March 2016). "How Google Plans to Solve Artificial Intelligence". MIT Technology Review. Archived from the original on 16 September 2024. Retrieved 5 October 2024. Smith, Craig S. (15 March 2023). "ChatGPT-4 Creator Ilya Sutskever on AI Hallucinations and AI Democracy". Forbes. Archived from the original on 18 September 2024. Retrieved 25 December 2023. Smoliar, Stephen W.; Zhang, HongJiang (1994). "Content based video indexing and retrieval". IEEE MultiMedia. 1 (2): 62–72. doi:10.1109/93.311653. S2CID 32710913. Solomonoff, Ray (1956). An Inductive Inference Machine (PDF). Dartmouth Summer Research Conference on Artificial Intelligence. Archived (PDF) from the original on 26 April 2011. Retrieved 22 March 2011 – via std.com, pdf scanned copy of the original. Later published as Solomonoff, Ray (1957). "An Inductive Inference Machine". IRE Convention Record. Vol. Section on Information Theory, part 2. pp. 56–62. Stanford University (2023). "Artificial Intelligence Index Report 2023/Chapter 6: Policy and Governance" (PDF). AI Index. Archived (PDF) from the original on 19 June 2023. Retrieved 19 June 2023. Tao, Jianhua; Tan, Tieniu (2005). Affective Computing and Intelligent Interaction. Affective Computing: A Review. Lecture Notes in Computer Science. Vol. 3784. Springer. pp. 981–995. doi:10.1007/11573548. ISBN 978-3-5402-9621-8. Taylor, Josh; Hern, Alex (2 May 2023). "'Godfather of AI' Geoffrey Hinton quits Google and warns over dangers of misinformation". The Guardian. Archived from the original on 5 October 2024. Retrieved 5 October 2024. Thompson, Derek (23 January 2014). "What Jobs Will the Robots Take?". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Thro, Ellen (1993). Robotics: The Marriage of Computers and Machines. New York: Facts on File. ISBN 978-0-8160-2628-9. Archived from the original on 26 July 2020. Retrieved 22 August 2020. Toews, Rob (3 September 2023). "Transformers Revolutionized AI. What Will Replace Them?". Forbes. Archived from the original on 8 December 2023. Retrieved 8 December 2023. Turing, Alan (October 1950). "Computing Machinery and Intelligence". Mind. 59 (236): 433–460. doi:10.1093/mind/LIX.236.433. ISSN 1460-2113. JSTOR 2251299. S2CID 14636783. UNESCO Science Report: the Race Against Time for Smarter Development. Paris: UNESCO. 2021. ISBN 978-9-2310-0450-6. Archived from the original on 18 June 2022. Retrieved 18 September 2021. Urbina, Fabio; Lentzos, Filippa; Invernizzi, Cédric; Ekins, Sean (7 March 2022). "Dual use of artificial-intelligence-powered drug discovery". Nature Machine Intelligence. 4 (3): 189–191. doi:10.1038/s42256-022-00465-9. PMC 9544280. PMID 36211133. S2CID 247302391. Valance, Christ (30 May 2023). "Artificial intelligence could lead to extinction, experts warn". BBC News. Archived from the original on 17 June 2023. Retrieved 18 June 2023. Valinsky, Jordan (11 April 2019), "Amazon reportedly employs thousands of people to listen to your Alexa conversations", CNN.com, archived from the original on 26 January 2024, retrieved 5 October 2024 Verma, Yugesh (25 December 2021). "A Complete Guide to SHAP – SHAPley Additive exPlanations for Practitioners". Analytics India Magazine. Archived from the original on 25 November 2023. Retrieved 25 November 2023. Vincent, James (7 November 2019). "OpenAI has published the text-generating AI it said was too dangerous to share". The Verge. Archived from the original on 11 June 2020. Retrieved 11 June 2020. Vincent, James (15 November 2022). "The scary truth about AI copyright is nobody knows what will happen next". The Verge. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Vincent, James (3 April 2023). "AI is entering an era of corporate control". The Verge. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Vinge, Vernor (1993). "The Coming Technological Singularity: How to Survive in the Post-Human Era". Vision 21: Interdisciplinary Science and Engineering in the Era of Cyberspace: 11. Bibcode:1993vise.nasa...11V. Archived from the original on 1 January 2007. Retrieved 14 November 2011. Waddell, Kaveh (2018). "Chatbots Have Entered the Uncanny Valley". The Atlantic. Archived from the original on 24 April 2018. Retrieved 24 April 2018. Wallach, Wendell (2010). Moral Machines. Oxford University Press. Wason, P. C.; Shapiro, D. (1966). "Reasoning". In Foss, B. M. (ed.). New horizons in psychology. Harmondsworth: Penguin. Archived from the original on 26 July 2020. Retrieved 18 November 2019. Weng, J.; McClelland; Pentland, A.; Sporns, O.; Stockman, I.; Sur, M.; Thelen, E. (2001). "Autonomous mental development by robots and animals" (PDF). Science. 291 (5504): 599–600. doi:10.1126/science.291.5504.599. PMID 11229402. S2CID 54131797. Archived (PDF) from the original on 4 September 2013. Retrieved 4 June 2013 – via msu.edu. "What is 'fuzzy logic'? Are there computers that are inherently fuzzy and do not apply the usual binary logic?". Scientific American. 21 October 1999. Archived from the original on 6 May 2018. Retrieved 5 May 2018. Williams, Rhiannon (28 June 2023), "Humans may be more likely to believe disinformation generated by AI", MIT Technology Review, archived from the original on 16 September 2024, retrieved 5 October 2024 Wirtz, Bernd W.; Weyerer, Jan C.; Geyer, Carolin (24 July 2018). "Artificial Intelligence and the Public Sector – Applications and Challenges". International Journal of Public Administration. 42 (7): 596–615. doi:10.1080/01900692.2018.1498103. ISSN 0190-0692. S2CID 158829602. Archived from the original on 18 August 2020. Retrieved 22 August 2020. Wong, Matteo (19 May 2023), "ChatGPT Is Already Obsolete", The Atlantic, archived from the original on 18 September 2024, retrieved 5 October 2024 Yudkowsky, E (2008), "Artificial Intelligence as a Positive and Negative Factor in Global Risk" (PDF), Global Catastrophic Risks, Oxford University Press, 2008, Bibcode:2008gcr..book..303Y, archived (PDF) from the original on 19 October 2013, retrieved 24 September 2021 Further reading Autor, David H., "Why Are There Still So Many Jobs? The History and Future of Workplace Automation" (2015) 29(3) Journal of Economic Perspectives 3. Boyle, James, The Line: AI and the Future of Personhood, MIT Press, 2024. Cukier, Kenneth, "Ready for Robots? How to Think about the Future of AI", Foreign Affairs, vol. 98, no. 4 (July/August 2019), pp. 192–198. George Dyson, historian of computing, writes (in what might be called "Dyson's Law") that "Any system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand." (p. 197.) Computer scientist Alex Pentland writes: "Current AI machine-learning algorithms are, at their core, dead simple stupid. They work, but they work by brute force." (p. 198.) Evans, Woody (2015). "Posthuman Rights: Dimensions of Transhuman Worlds". Teknokultura. 12 (2). doi:10.5209/rev_TK.2015.v12.n2.49072. S2CID 147612763. Frank, Michael (22 September 2023). "US Leadership in Artificial Intelligence Can Shape the 21st Century Global Order". The Diplomat. Archived from the original on 16 September 2024. Retrieved 8 December 2023. "Instead, the United States has developed a new area of dominance that the rest of the world views with a mixture of awe, envy, and resentment: artificial intelligence... From AI models and research to cloud computing and venture capital, U.S. companies, universities, and research labs – and their affiliates in allied countries – appear to have an enormous lead in both developing cutting-edge AI and commercializing it. The value of U.S. venture capital investments in AI start-ups exceeds that of the rest of the world combined." Gertner, Jon. (2023) "Wikipedia's Moment of Truth: Can the online encyclopedia help teach A.I. chatbots to get their facts right — without destroying itself in the process?" New York Times Magazine (July 18, 2023) online Archived 20 July 2023 at the Wayback Machine Gleick, James, "The Fate of Free Will" (review of Kevin J. Mitchell, Free Agents: How Evolution Gave Us Free Will, Princeton University Press, 2023, 333 pp.), The New York Review of Books, vol. LXXI, no. 1 (18 January 2024), pp. 27–28, 30. "Agency is what distinguishes us from machines. For biological creatures, reason and purpose come from acting in the world and experiencing the consequences. Artificial intelligences – disembodied, strangers to blood, sweat, and tears – have no occasion for that." (p. 30.) Gleick, James, "The Parrot in the Machine" (review of Emily M. Bender and Alex Hanna, The AI Con: How to Fight Big Tech's Hype and Create the Future We Want, Harper, 274 pp.; and James Boyle, The Line: AI and the Future of Personhood, MIT Press, 326 pp.), The New York Review of Books, vol. LXXII, no. 12 (24 July 2025), pp. 43–46. "[C]hatbox 'writing' has a bland, regurgitated quality. Textures are flattened, sharp edges are sanded. No chatbox could ever have said that April is the cruelest month or that fog comes on little cat feet (though they might now, because one of their chief skills is plagiarism). And when synthetically extruded text turns out wrong, it can be comically wrong. When a movie fan asked Google whether a certain actor was in Heat, he received this 'AI Overview': 'No, Angelina Jolie is not in heat.'" (p. 44.) Halpern, Sue, "The Coming Tech Autocracy" (review of Verity Harding, AI Needs You: How We Can Change AI's Future and Save Our Own, Princeton University Press, 274 pp.; Gary Marcus, Taming Silicon Valley: How We Can Ensure That AI Works for Us, MIT Press, 235 pp.; Daniela Rus and Gregory Mone, The Mind's Mirror: Risk and Reward in the Age of AI, Norton, 280 pp.; Madhumita Murgia, Code Dependent: Living in the Shadow of AI, Henry Holt, 311 pp.), The New York Review of Books, vol. LXXI, no. 17 (7 November 2024), pp. 44–46. "'We can't realistically expect that those who hope to get rich from AI are going to have the interests of the rest of us close at heart,' ... writes [Gary Marcus]. 'We can't count on governments driven by campaign finance contributions [from tech companies] to push back.'... Marcus details the demands that citizens should make of their governments and the tech companies. They include transparency on how AI systems work; compensation for individuals if their data [are] used to train LLMs (large language model)s and the right to consent to this use; and the ability to hold tech companies liable for the harms they cause by eliminating Section 230, imposing cash penalties, and passing stricter product liability laws... Marcus also suggests... that a new, AI-specific federal agency, akin to the FDA, the FCC, or the FTC, might provide the most robust oversight.... [T]he Fordham law professor Chinmayi Sharma... suggests... establish[ing] a professional licensing regime for engineers that would function in a similar way to medical licenses, malpractice suits, and the Hippocratic oath in medicine. 'What if, like doctors,' she asks..., 'AI engineers also vowed to do no harm?'" (p. 46.) Henderson, Mark (24 April 2007). "Human rights for robots? We're getting carried away". The Times Online. London. Archived from the original on 31 May 2014. Retrieved 31 May 2014. Hughes-Castleberry, Kenna, "A Murder Mystery Puzzle: The literary puzzle Cain's Jawbone, which has stumped humans for decades, reveals the limitations of natural-language-processing algorithms", Scientific American, vol. 329, no. 4 (November 2023), pp. 81–82. "This murder mystery competition has revealed that although NLP (natural-language processing) models are capable of incredible feats, their abilities are very much limited by the amount of context they receive. This [...] could cause [difficulties] for researchers who hope to use them to do things such as analyze ancient languages. In some cases, there are few historical records on long-gone civilizations to serve as training data for such a purpose." (p. 82.) Immerwahr, Daniel, "Your Lying Eyes: People now use A.I. to generate fake videos indistinguishable from real ones. How much does it matter?", The New Yorker, 20 November 2023, pp. 54–59. "If by 'deepfakes' we mean realistic videos produced using artificial intelligence that actually deceive people, then they barely exist. The fakes aren't deep, and the deeps aren't fake. [...] A.I.-generated videos are not, in general, operating in our media as counterfeited evidence. Their role better resembles that of cartoons, especially smutty ones." (p. 59.) Johnston, John (2008) The Allure of Machinic Life: Cybernetics, Artificial Life, and the New AI, MIT Press. Jumper, John; Evans, Richard; Pritzel, Alexander; et al. (26 August 2021). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. PMC 8371605. PMID 34265844. S2CID 235959867. LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (28 May 2015). "Deep learning". Nature. 521 (7553): 436–444. Bibcode:2015Natur.521..436L. doi:10.1038/nature14539. PMID 26017442. S2CID 3074096. Archived from the original on 5 June 2023. Retrieved 19 June 2023. Leffer, Lauren, "The Risks of Trusting AI: We must avoid humanizing machine-learning models used in scientific research", Scientific American, vol. 330, no. 6 (June 2024), pp. 80–81. Lepore, Jill, "The Chit-Chatbot: Is talking with a machine a conversation?", The New Yorker, 7 October 2024, pp. 12–16. Maschafilm (2010). "Content: Plug & Pray Film – Artificial Intelligence – Robots". plugandpray-film.de. Archived from the original on 12 February 2016. Marcus, Gary, "Artificial Confidence: Even the newest, buzziest systems of artificial general intelligence are stymmied by the same old problems", Scientific American, vol. 327, no. 4 (October 2022), pp. 42–45. Mitchell, Melanie (2019). Artificial intelligence: a guide for thinking humans. New York: Farrar, Straus and Giroux. ISBN 978-0-3742-5783-5. Mnih, Volodymyr; Kavukcuoglu, Koray; Silver, David; et al. (26 February 2015). "Human-level control through deep reinforcement learning". Nature. 518 (7540): 529–533. Bibcode:2015Natur.518..529M. doi:10.1038/nature14236. PMID 25719670. S2CID 205242740. Archived from the original on 19 June 2023. Retrieved 19 June 2023. Introduced DQN, which produced human-level performance on some Atari games. Press, Eyal, "In Front of Their Faces: Does facial-recognition technology lead police to ignore contradictory evidence?", The New Yorker, 20 November 2023, pp. 20–26. "Robots could demand legal rights". BBC News. 21 December 2006. Archived from the original on 15 October 2019. Retrieved 3 February 2011. Roivainen, Eka, "AI's IQ: ChatGPT aced a [standard intelligence] test but showed that intelligence cannot be measured by IQ alone", Scientific American, vol. 329, no. 1 (July/August 2023), p. 7. "Despite its high IQ, ChatGPT fails at tasks that require real humanlike reasoning or an understanding of the physical and social world.... ChatGPT seemed unable to reason logically and tried to rely on its vast database of... facts derived from online texts." Scharre, Paul, "Killer Apps: The Real Dangers of an AI Arms Race", Foreign Affairs, vol. 98, no. 3 (May/June 2019), pp. 135–144. "Today's AI technologies are powerful but unreliable. Rules-based systems cannot deal with circumstances their programmers did not anticipate. Learning systems are limited by the data on which they were trained. AI failures have already led to tragedy. Advanced autopilot features in cars, although they perform well in some circumstances, have driven cars without warning into trucks, concrete barriers, and parked cars. In the wrong situation, AI systems go from supersmart to superdumb in an instant. When an enemy is trying to manipulate and hack an AI system, the risks are even greater." (p. 140.) Schulz, Hannes; Behnke, Sven (1 November 2012). "Deep Learning". KI – Künstliche Intelligenz. 26 (4): 357–363. doi:10.1007/s13218-012-0198-z. ISSN 1610-1987. S2CID 220523562. Serenko, Alexander; Michael Dohan (2011). "Comparing the expert survey and citation impact journal ranking methods: Example from the field of Artificial Intelligence" (PDF). Journal of Informetrics. 5 (4): 629–649. doi:10.1016/j.joi.2011.06.002. Archived (PDF) from the original on 4 October 2013. Retrieved 12 September 2013. Silver, David; Huang, Aja; Maddison, Chris J.; et al. (28 January 2016). "Mastering the game of Go with deep neural networks and tree search". Nature. 529 (7587): 484–489. Bibcode:2016Natur.529..484S. doi:10.1038/nature16961. PMID 26819042. S2CID 515925. Archived from the original on 18 June 2023. Retrieved 19 June 2023. Tarnoff, Ben, "The Labor Theory of AI" (review of Matteo Pasquinelli, The Eye of the Master: A Social History of Artificial Intelligence, Verso, 2024, 264 pp.), The New York Review of Books, vol. LXXII, no. 5 (27 March 2025), pp. 30–32. The reviewer, Ben Tarnoff, writes: "The strangeness at the heart of the generative AI boom is that nobody really knows how the technology works. We know how the large language models within ChatGPT and its counterparts are trained, even if we don't always know which data they're being trained on: they are asked to predict the next string of characters in a sequence. But exactly how they arrive at any given prediction is a mystery. The computations that occur inside the model are simply too intricate for any human to comprehend." (p. 32.) Vaswani, Ashish, Noam Shazeer, Niki Parmar et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). Seminal paper on transformers. Vincent, James, "Horny Robot Baby Voice: James Vincent on AI chatbots", London Review of Books, vol. 46, no. 19 (10 October 2024), pp. 29–32. "[AI chatbot] programs are made possible by new technologies but rely on the timelelss human tendency to anthropomorphise." (p. 29.) White Paper: On Artificial Intelligence – A European approach to excellence and trust (PDF). Brussels: European Commission. 2020. Archived (PDF) from the original on 20 February 2020. Retrieved 20 February 2020. External links Artificial intelligence at Wikipedia's sister projects Definitions from Wiktionary Media from Commons Quotations from Wikiquote Textbooks from Wikibooks Resources from Wikiversity Data from Wikidata Scholia has a topic profile for Artificial intelligence. "Artificial Intelligence". Internet Encyclopedia of Philosophy. vte Artificial intelligence (AI) Articles related to artificial intelligence icon Authority control databases Edit this at Wikidata Categories: Artificial intelligenceComputational fields of studyComputational neuroscienceCyberneticsData scienceFormal sciencesIntelligence by type This page was last edited on 1 August 2025, at 07:39 (UTC). Text is available under the Creative Commons Attribution-ShareAlike 4.0 License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Code of Conduct Developers Statistics Cookie statement Mobile view Wikimedia Foundation Powered by MediaWiki Wikipedia The Free Encyclopedia Donate Create account Log in Contents (Top) History Training Architecture Full transformer architecture Subsequent work Applications See also Notes References Further reading Transformer (deep learning architecture) Article Talk Read Edit View history Tools Appearance Text Small Standard Large Width Standard Wide Color (beta) Automatic Light Dark From Wikipedia, the free encyclopedia Part of a series on Machine learning and data mining Paradigms Problems Supervised learning (classification • regression) Clustering Dimensionality reduction Structured prediction Anomaly detection Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM) Reinforcement learning Learning with humans Model diagnostics Mathematical foundations Journals and conferences Related articles vte A standard Transformer architecture, showing on the left an encoder, and on the right a decoder. Note: it uses the pre-LN convention, which is different from the post-LN convention used in the original 2017 Transformer. In deep learning, transformer is an architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table.[1] At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM).[2] Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.[3] The modern version of the transformer was proposed in the 2017 paper "Attention Is All You Need" by researchers at Google.[1] Transformers were first developed as an improvement over previous architectures for machine translation,[4][5] but have found many applications since. They are used in large-scale natural language processing, computer vision (vision transformers), reinforcement learning,[6][7] audio,[8] multimodal learning, robotics,[9] and even playing chess.[10] It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs)[11] and BERT[12] (bidirectional encoder representations from transformers). History See also: Timeline of machine learning Predecessors For many years, sequence modelling and generation was done by using plain recurrent neural networks (RNNs). A well-cited early example was the Elman network (1990). In theory, the information from one token can propagate arbitrarily far down the sequence, but in practice the vanishing-gradient problem leaves the model's state at the end of a long sentence without precise, extractable information about preceding tokens. A key breakthrough was LSTM (1995),[note 1] a RNN which used various innovations to overcome the vanishing gradient problem, allowing efficient learning of long-sequence modelling. One key innovation was the use of an attention mechanism which used neurons that multiply the outputs of other neurons, so-called multiplicative units.[13] Neural networks using multiplicative units were later called sigma-pi networks[14] or higher-order networks.[15] LSTM became the standard architecture for long sequence modelling until the 2017 publication of Transformers. However, LSTM still used sequential processing, like most other RNNs.[note 2] Specifically, RNNs operate one token at a time from first to last; they cannot operate in parallel over all tokens in a sequence. Modern Transformers overcome this problem, but unlike RNNs, they require computation time that is quadratic in the size of the context window. The linearly scaling fast weight controller (1992) learns to compute a weight matrix for further processing depending on the input.[16] One of its two networks has "fast weights" or "dynamic links" (1981).[17][18][19] A slow neural network learns by gradient descent to generate keys and values for computing the weight changes of the fast neural network which computes answers to queries.[16] This was later shown to be equivalent to the unnormalized linear Transformer.[20][21] Attention with seq2seq Main article: Seq2seq § History The idea of encoder-decoder sequence transduction had been developed in the early 2010s; commonly cited as the originators that produced seq2seq are two concurrently published papers from 2014.[22][23] A 380M-parameter model for machine translation uses two long short-term memories (LSTM).[23] Its architecture consists of two parts. The encoder is an LSTM that takes in a sequence of tokens and turns it into a vector. The decoder is another LSTM that converts the vector into a sequence of tokens. Similarly, another 130M-parameter model used gated recurrent units (GRU) instead of LSTM.[22] Later research showed that GRUs are neither better nor worse than LSTMs for seq2seq.[24][25] These early seq2seq models had no attention mechanism, and the state vector is accessible only after the last word of the source text was processed. Although in theory such a vector retains the information about the whole original sentence, in practice the information is poorly preserved. This is because the input is processed sequentially by one recurrent network into a fixed-size output vector, which is then processed by another recurrent network into an output. If the input is long, then the output vector would not be able to contain all relevant information, degrading the output. As evidence, reversing the input sentence improved seq2seq translation.[26] The RNNsearch model introduced an attention mechanism to seq2seq for machine translation to solve the bottleneck problem (of the fixed-size output vector), allowing the model to process long-distance dependencies more easily. The name is because it "emulates searching through a source sentence during decoding a translation".[4] The relative performances were compared between global (that of RNNsearch) and local (sliding window) attention model architectures for machine translation, finding that mixed attention had higher quality than global attention, while local attention reduced translation time.[27] In 2016, Google Translate was revamped to Google Neural Machine Translation, which replaced the previous model based on statistical machine translation. The new model was a seq2seq model where the encoder and the decoder were both 8 layers of bidirectional LSTM.[28] It took nine months to develop, and it outperformed the statistical approach, which took ten years to develop.[29] Parallelizing attention Main article: Attention (machine learning) § History Seq2seq models with attention (including self-attention) still suffered from the same issue with recurrent networks, which is that they are hard to parallelize, which prevented them from being accelerated on GPUs. In 2016, decomposable attention applied a self-attention mechanism to feedforward networks, which are easy to parallelize, and achieved SOTA result in textual entailment with an order of magnitude fewer parameters than LSTMs.[30] One of its authors, Jakob Uszkoreit, suspected that attention without recurrence would be sufficient for language translation, thus the title "attention is all you need".[31] That hypothesis was against conventional wisdom at the time, and even his father Hans Uszkoreit, a well-known computational linguist, was skeptical.[31] In the same year, self-attention (called intra-attention or intra-sentence attention) was proposed for LSTMs.[32] In 2017, the original (100M-sized) encoder-decoder transformer model was proposed in the "Attention is all you need" paper. At the time, the focus of the research was on improving seq2seq for machine translation, by removing its recurrence to process all tokens in parallel, but preserving its dot-product attention mechanism to keep its text processing performance.[1] This led to the introduction of a multi-head attention model that was easier to parallelize due to the use of independent heads and the lack of recurrence. Its parallelizability was an important factor to its widespread use in large neural networks.[33] AI boom era Already in spring 2017, even before the "Attention is all you need" preprint was published, one of the co-authors applied the "decoder-only" variation of the architecture to generate fictitious Wikipedia articles.[34] Transformer architecture is now used alongside many generative models that contribute to the ongoing AI boom. In language modelling, ELMo (2018) was a bi-directional LSTM that produces contextualized word embeddings, improving upon the line of research from bag of words and word2vec. It was followed by BERT (2018), an encoder-only Transformer model.[35] In 2019 October, Google started using BERT to process search queries.[36] In 2020, Google Translate replaced the previous RNN-encoder–RNN-decoder model by a Transformer-encoder–RNN-decoder model.[37] Starting in 2018, the OpenAI GPT series of decoder-only Transformers became state of the art in natural language generation. In 2022, a chatbot based on GPT-3, ChatGPT, became unexpectedly[38] popular, triggering a boom around large language models.[39][40] Since 2020, Transformers have been applied in modalities beyond text, including the vision transformer,[41] speech recognition,[42] robotics,[6] and multimodal.[43] The vision transformer, in turn, stimulated new developments in convolutional neural networks.[44] Image and video generators like DALL-E (2021), Stable Diffusion 3 (2024),[45] and Sora (2024), use Transformers to analyse input data (like text prompts) by breaking it down into "tokens" and then calculating the relevance between each token using self-attention, which helps the model understand the context and relationships within the data. Training Methods for stabilizing training The plain transformer architecture had difficulty converging. In the original paper[1] the authors recommended using learning rate warmup. That is, the learning rate should linearly scale up from 0 to maximal value for the first part of the training (usually recommended to be 2% of the total number of training steps), before decaying again. A 2020 paper found that using layer normalization before (instead of after) multiheaded attention and feedforward layers stabilizes training, not requiring learning rate warmup.[46] Pretrain-finetune Transformers typically are first pretrained by self-supervised learning on a large generic dataset, followed by supervised fine-tuning on a small task-specific dataset. The pretrain dataset is typically an unlabeled large corpus, such as The Pile. Tasks for pretraining and fine-tuning commonly include: language modeling[12] next-sentence prediction[12] question answering[3] reading comprehension sentiment analysis[1] paraphrasing[1] The T5 transformer report[47] documents a large number of natural language pretraining tasks. Some examples are: restoring or repairing incomplete or corrupted text. For example, the input, "Thank you ~~ me to your party ~~ week", might generate the output, "Thank you for inviting me to your party last week". translation between natural languages (machine translation) judging the pragmatic acceptability of natural language. For example, the following sentence might be judged "not acceptable",[48] because even though it is syntactically well-formed, it is improbable in ordinary human usage: The course is jumping well. Note that while each of these tasks is trivial or obvious for human native speakers of the language (or languages), they have typically proved challenging for previous generations of machine learning architecture. Tasks See also: Large language model § Evaluation In general, there are 3 classes of language modelling tasks: "masked",[49] "autoregressive",[50] and "prefixLM".[51] These classes are independent of a specific modeling architecture such as Transformer, but they are often discussed in the context of Transformer. In a masked task,[49] one or more of the tokens is masked out, and the model would produce a probability distribution predicting what the masked-out tokens are based on the context. The loss function for the task is typically sum of log-perplexities for the masked-out tokens: Loss = − ∑ t ∈ masked tokens ln ⁡ ( probability of t conditional on its context ) {\displaystyle {\text{Loss}}=-\sum _{t\in {\text{masked tokens}}}\ln({\text{probability of }}t{\text{ conditional on its context}})}and the model is trained to minimize this loss function. The BERT series of models are trained for masked token prediction and another task. In an autoregressive task,[50] the entire sequence is masked at first, and the model produces a probability distribution for the first token. Then the first token is revealed and the model predicts the second token, and so on. The loss function for the task is still typically the same. The GPT series of models are trained by autoregressive tasks. In a prefixLM task,[51] the sequence is divided into two parts. The first part is presented as context, and the model predicts the first token of the second part. Then that would be revealed, and the model predicts the second token, and so on. The loss function for the task is still typically the same. The T5 series of models are trained by prefixLM tasks. Note that "masked" as in "masked language modelling" is not "masked" as in "masked attention", and "prefixLM" (prefix language modeling) is not "prefixLM" (prefix language model). Architecture All transformers have the same primary components: Tokenizers, which convert text into tokens. Embedding layer, which converts tokens and positions of the tokens into vector representations. Transformer layers, which carry out repeated transformations on the vector representations, extracting more and more linguistic information. These consist of alternating attention and feedforward layers. There are two major types of transformer layers: encoder layers and decoder layers, with further variants. Un-embedding layer, which converts the final vector representations back to a probability distribution over the tokens. The following description follows exactly the Transformer as described in the original paper. There are variants, described in the following section. By convention, we write all vectors as row vectors. This, for example, means that pushing a vector through a linear layer means multiplying it by a weight matrix on the right, as x W {\displaystyle xW}. Tokenization Main article: Lexical analysis As the Transformer architecture natively processes numerical data, not text, there must be a translation between text and tokens. A token is an integer that represents a character, or a short segment of characters. On the input side, the input text is parsed into a token sequence. Similarly, on the output side, the output tokens are parsed back to text. The module doing the conversion between texts and token sequences is a tokenizer. The set of all tokens is the vocabulary of the tokenizer, and its size is the vocabulary size n vocabulary {\displaystyle n_{\text{vocabulary}}}. When faced with tokens outside the vocabulary, typically a special token is used, written as "[UNK]" for "unknown". Some commonly used tokenizers are byte pair encoding, WordPiece, and SentencePiece. Embedding Further information: Word embedding Each token is converted into an embedding vector via a lookup table. Equivalently stated, it multiplies a one-hot representation of the token by an embedding matrix M {\displaystyle M}. For example, if the input token is 3 {\displaystyle 3}, then the one-hot representation is [ 0 , 0 , 0 , 1 , 0 , 0 , … ] {\displaystyle [0,0,0,1,0,0,\dots ]}, and its embedding vector is E m b e d ( 3 ) = [ 0 , 0 , 0 , 1 , 0 , 0 , … ] M {\displaystyle \mathrm {Embed} (3)=[0,0,0,1,0,0,\dots ]M}The token embedding vectors are added to their respective positional encoding vectors (see below), producing the sequence of input vectors. The number of dimensions in an embedding vector is called hidden size or embedding size and written as d emb {\displaystyle d_{\text{emb}}}.[35] This size is written as d model {\displaystyle d_{\text{model}}} in the original Transformer paper.[1] Un-embedding An un-embedding layer is almost the reverse of an embedding layer. Whereas an embedding layer converts a token into a vector, an un-embedding layer converts a vector into a probability distribution over tokens. The un-embedding layer is a linear-softmax layer: U n E m b e d ( x ) = s o f t m a x ( x W + b ) {\displaystyle \mathrm {UnEmbed} (x)=\mathrm {softmax} (xW+b)}The matrix has shape ( d emb , n vocabulary ) {\displaystyle (d_{\text{emb}},n_{\text{vocabulary}})}. The embedding matrix M {\displaystyle M} and the un-embedding matrix W {\displaystyle W} are sometimes required to be transposes of each other, a practice called weight tying.[52] Positional encoding A diagram of a sinusoidal positional encoding with parameters N = 10000 , d = 100 {\displaystyle N=10000,d=100} A positional encoding is a fixed-size vector representation of the relative positions of tokens within a sequence: it provides the transformer model with information about where the words are in the input sequence. This induces a bias towards the order of the input sequence, so that, for example, the input sequence "man bites dog" is processed differently from "dog bites man". The positional encoding is defined as a function of type f : R → R d ; d ∈ Z , d > 0 {\displaystyle f:\mathbb {R} \to \mathbb {R} ^{d};d\in \mathbb {Z} ,d>0}, where d {\displaystyle d} is a positive even integer. The full positional encoding defined in the original paper[1] is: ( f ( t ) 2 k , f ( t ) 2 k + 1 ) = ( sin ⁡ ( θ ) , cos ⁡ ( θ ) ) ∀ k ∈ { 0 , 1 , … , d / 2 − 1 } {\displaystyle (f(t)_{2k},f(t)_{2k+1})=(\sin(\theta ),\cos(\theta ))\quad \forall k\in \{0,1,\ldots ,d/2-1\}}where θ = t r k , r = N 2 / d {\displaystyle \theta ={\frac {t}{r^{k}}},r=N^{2/d}}. Here, N {\displaystyle N} is a free parameter that should be significantly larger than the biggest k {\displaystyle k} that would be input into the positional encoding function. The original paper uses N = 10000 {\displaystyle N=10000}. The function is in a simpler form when written as a complex function of type f : R → C d / 2 {\displaystyle f:\mathbb {R} \to \mathbb {C} ^{d/2}} f ( t ) = ( e i t / r k ) k = 0 , 1 , … , d 2 − 1 {\displaystyle f(t)=\left(e^{it/r^{k}}\right)_{k=0,1,\ldots ,{\frac {d}{2}}-1}}where r = N 2 / d {\displaystyle r=N^{2/d}}. The main reason for using this positional encoding function is that using it, shifts are linear transformations: f ( t + Δ t ) = d i a g ( f ( Δ t ) ) f ( t ) {\displaystyle f(t+\Delta t)=\mathrm {diag} (f(\Delta t))f(t)}where Δ t ∈ R {\displaystyle \Delta t\in \mathbb {R} } is the distance one wishes to shift. This allows the transformer to take any encoded position, and find the encoding of the position n-steps-ahead or n-steps-behind, by a matrix multiplication. By taking a linear sum, any convolution can also be implemented as linear transformations: ∑ j c j f ( t + Δ t j ) = ( ∑ j c j d i a g ( f ( Δ t j ) ) ) f ( t ) {\displaystyle \sum _{j}c_{j}f(t+\Delta t_{j})=\left(\sum _{j}c_{j}\,\mathrm {diag} (f(\Delta t_{j}))\right)f(t)}for any constants c j {\displaystyle c_{j}}. This allows the transformer to take any encoded position and find a linear sum of the encoded locations of its neighbors. This sum of encoded positions, when fed into the attention mechanism, would create attention weights on its neighbors, much like what happens in a convolutional neural network language model. In the author's words, "we hypothesized it would allow the model to easily learn to attend by relative position." In typical implementations, all operations are done over the real numbers, not the complex numbers, but since complex multiplication can be implemented as real 2-by-2 matrix multiplication, this is a mere notational difference. Encoder-decoder (overview) One encoder-decoder block A Transformer is composed of stacked encoder layers and decoder layers. Like earlier seq2seq models, the original transformer model used an encoder-decoder architecture. The encoder consists of encoding layers that process all the input tokens together one layer after another, while the decoder consists of decoding layers that iteratively process the encoder's output and the decoder's output tokens so far. The purpose of each encoder layer is to create contextualized representations of the tokens, where each representation corresponds to a token that "mixes" information from other input tokens via self-attention mechanism. Each decoder layer contains two attention sublayers: (1) cross-attention for incorporating the output of encoder (contextualized input token representations), and (2) self-attention for "mixing" information among the input tokens to the decoder (i.e. the tokens generated so far during inference time).[53][54] Both the encoder and decoder layers have a feed-forward neural network for additional processing of their outputs and contain residual connections and layer normalization steps.[54] These feed-forward layers contain most of the parameters in a Transformer model. Feedforward network The feedforward network module. It is a two-layered network that maps d emb {\displaystyle d_{\text{emb}}}-dimensional vectors into d emb {\displaystyle d_{\text{emb}}}-dimensional vectors. The feedforward network (FFN) modules in a Transformer are 2-layered multilayer perceptrons: F F N ( x ) = ϕ ( x W ( 1 ) + b ( 1 ) ) W ( 2 ) + b ( 2 ) {\displaystyle \mathrm {FFN} (x)=\phi (xW^{(1)}+b^{(1)})W^{(2)}+b^{(2)}}where W ( 1 ) {\displaystyle W^{(1)}} and W ( 2 ) {\displaystyle W^{(2)}} are weight matrices and b ( 1 ) {\displaystyle b^{(1)}} and b ( 2 ) {\displaystyle b^{(2)}} are bias vectors, and ϕ {\displaystyle \phi } is its activation function. The original Transformer used ReLU activation. The number of neurons in the middle layer is called intermediate size (GPT),[55] filter size (BERT),[35] or feedforward size (BERT).[35] It is typically larger than the embedding size. For example, in both GPT-2 series and BERT series, the intermediate size of a model is 4 times its embedding size: d ffn = 4 d emb {\displaystyle d_{\text{ffn}}=4d_{\text{emb}}}. Scaled dot-product attention Main article: Dot-product attention Attention head Scaled dot-product attention, block diagram Exact dimension counts within an attention head module The attention mechanism used in the Transformer architecture are scaled dot-product attention units. For each unit, the transformer model learns three weight matrices: the query weights W Q {\displaystyle W^{Q}}, the key weights W K {\displaystyle W^{K}}, and the value weights W V {\displaystyle W^{V}}. The module takes three sequences, a query sequence, a key sequence, and a value sequence. The query sequence is a sequence of length ℓ seq, query {\displaystyle \ell _{\text{seq, query}}}, and each entry is a vector of dimension d emb, query {\displaystyle d_{\text{emb, query}}}. Similarly for the key and value sequences. For each vector x i , query {\displaystyle x_{i,{\text{query}}}} in the query sequence, it is multiplied by a matrix W Q {\displaystyle W^{Q}} to produce a query vector q i = x i , query W Q {\displaystyle q_{i}=x_{i,{\text{query}}}W^{Q}}. The matrix of all query vectors is the query matrix: Q = X query W Q {\displaystyle Q=X_{\text{query}}W^{Q}}Similarly, we construct the key matrix K = X key W K {\displaystyle K=X_{\text{key}}W^{K}} and the value matrix V = X value W V {\displaystyle V=X_{\text{value}}W^{V}}. It is usually the case that all W Q , W K , W V {\displaystyle W^{Q},W^{K},W^{V}} are square matrices, meaning d emb, query = d query {\displaystyle d_{\text{emb, query}}=d_{\text{query}}}, etc. Attention weights are calculated using the query and key vectors: the attention weight a i j {\displaystyle a_{ij}} from token i {\displaystyle i} to token j {\displaystyle j} is the dot product between q i {\displaystyle q_{i}} and k j {\displaystyle k_{j}}. The attention weights are divided by the square root of the dimension of the key vectors, d k {\displaystyle {\sqrt {d_{k}}}}, which stabilizes gradients during training, and passed through a softmax which normalizes the weights. The fact that W Q {\displaystyle W^{Q}} and W K {\displaystyle W^{K}} are different matrices allows attention to be non-symmetric: if token i {\displaystyle i} attends to token j {\displaystyle j} (i.e. q i ⋅ k j {\displaystyle q_{i}\cdot k_{j}} is large), this does not necessarily mean that token j {\displaystyle j} will attend to token i {\displaystyle i} (i.e. q j ⋅ k i {\displaystyle q_{j}\cdot k_{i}} could be small). The output of the attention unit for token i {\displaystyle i} is the weighted sum of the value vectors of all tokens, weighted by a i j {\displaystyle a_{ij}}, the attention from token i {\displaystyle i} to each token. The attention calculation for all tokens can be expressed as one large matrix calculation using the softmax function, which is useful for training due to computational matrix operation optimizations that quickly compute matrix operations. The matrices Q {\displaystyle Q}, K {\displaystyle K} and V {\displaystyle V} are defined as the matrices where the i {\displaystyle i}th rows are vectors q i {\displaystyle q_{i}}, k i {\displaystyle k_{i}}, and v i {\displaystyle v_{i}} respectively. Then we can represent the attention as Attention ( Q , K , V ) = softmax ( Q K T d k ) V {\displaystyle {\begin{aligned}{\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{\mathrm {T} }}{\sqrt {d_{k}}}}\right)V\end{aligned}}} where the softmax is applied over each of the rows of the matrix. The number of dimensions in a query vector is query size d query {\displaystyle d_{\text{query}}} and similarly for the key size d key {\displaystyle d_{\text{key}}} and value size d value {\displaystyle d_{\text{value}}}. The output dimension of an attention head is its head dimension d head {\displaystyle d_{\text{head}}}. The attention mechanism requires the following three equalities to hold: ℓ seq, key = ℓ seq, value , d query = d key , d value = d head {\displaystyle \ell _{\text{seq, key}}=\ell _{\text{seq, value}},\;d_{\text{query}}=d_{\text{key}},\;d_{\text{value}}=d_{\text{head}}}but is otherwise unconstrained. If the attention head is used in a self-attention fashion, then X query = X key = X value {\displaystyle X_{\text{query}}=X_{\text{key}}=X_{\text{value}}}. If the attention head is used in a cross-attention fashion, then usually X query ≠ X key = X value {\displaystyle X_{\text{query}}\neq X_{\text{key}}=X_{\text{value}}}. It is theoretically possible for all three to be different, but that is rarely the case in practice. Multiheaded attention Multiheaded attention, block diagram Exact dimension counts within a multiheaded attention module One set of ( W Q , W K , W V ) {\displaystyle \left(W^{Q},W^{K},W^{V}\right)} matrices is called an attention head, and each layer in a transformer model has multiple attention heads. While each attention head attends to the tokens that are relevant to each token, multiple attention heads allow the model to do this for different definitions of "relevance". Specifically, the query and key projection matrices, W Q {\displaystyle W^{Q}} and W K {\displaystyle W^{K}} , which are involved in the attention score computation, defines the "relevance". Meanwhile, the value projection matrix W V {\displaystyle W^{V}}, in combination with the part of the output projection matrix W O {\displaystyle W^{O}}, determines how the attended tokens influence what information is passed to subsequent layers and ultimately the output logits. In addition, the scope of attention, or the range of token relationships captured by each attention head, can expand as tokens pass through successive layers. This allows the model to capture more complex and long-range dependencies in deeper layers. Many transformer attention heads encode relevance relations that are meaningful to humans. For example, some attention heads can attend mostly to the next word, while others mainly attend from verbs to their direct objects.[56] The computations for each attention head can be performed in parallel, which allows for fast processing. The outputs for the attention layer are concatenated to pass into the feed-forward neural network layers. Concretely, let the multiple attention heads be indexed by i {\displaystyle i}, then we have MultiheadedAttention ( Q , K , V ) = Concat i ∈ [ n heads ] ( Attention ( X W i Q , X W i K , X W i V ) ) W O {\displaystyle {\text{MultiheadedAttention}}(Q,K,V)={\text{Concat}}_{i\in [n_{\text{heads}}]}({\text{Attention}}(XW_{i}^{Q},XW_{i}^{K},XW_{i}^{V}))W^{O}} where the matrix X {\displaystyle X} is the concatenation of word embeddings, and the matrices W i Q , W i K , W i V {\displaystyle W_{i}^{Q},W_{i}^{K},W_{i}^{V}} are "projection matrices" owned by individual attention head i {\displaystyle i}, and W O {\displaystyle W^{O}} is a final projection matrix owned by the whole multi-headed attention head. It is theoretically possible for each attention head to have a different head dimension d head {\displaystyle d_{\text{head}}}, but that is rarely the case in practice. As an example, in the smallest GPT-2 model, there are only self-attention mechanisms. It has the following dimensions: d emb = 768 , n head = 12 , d head = 64 {\displaystyle d_{\text{emb}}=768,n_{\text{head}}=12,d_{\text{head}}=64}Since 12 × 64 = 768 {\displaystyle 12\times 64=768}, its output projection matrix W O ∈ R ( 12 × 64 ) × 768 {\displaystyle W^{O}\in \mathbb {R} ^{(12\times 64)\times 768}} is a square matrix. Masked attention The Transformer architecture is constructed to calculate output tokens iteratively. Assuming t = 0 {\displaystyle t=0} refers to the calculation of the first output token i = 0 {\displaystyle i=0}, for step t > 0 {\displaystyle t>0}, the output token i = 0 {\displaystyle i=0} shall remain constant. This ensures properties of the model similar to autoregressive models.[1] Therefore, at every time step t {\displaystyle t}, the calculation for all outputs i {\displaystyle i} should not have access to tokens at position j {\displaystyle j} for j >= i {\displaystyle j>=i} (as it naturally is the case for time step t = i {\displaystyle t=i}, when tokens j > t {\displaystyle j>t} are not yet calculated). This behavior may be accomplished before the softmax stage by adding a mask matrix M {\displaystyle M} that is − ∞ {\displaystyle -\infty } at entries where the attention link must be cut, and 0 {\displaystyle 0} at other places: MaskedAttention ( Q , K , V ) = softmax ( M + Q K T d k ) V {\displaystyle {\begin{aligned}{\text{MaskedAttention}}(Q,K,V)={\text{softmax}}\left(M+{\frac {QK^{\mathrm {T} }}{\sqrt {d_{k}}}}\right)V\end{aligned}}} The following matrix is commonly used in decoder self-attention modules, called "causal masking": M causal = [ 0 − ∞ − ∞ … − ∞ 0 0 − ∞ … − ∞ 0 0 0 … − ∞ ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 … 0 ] {\displaystyle M_{\text{causal}}={\begin{bmatrix}0&-\infty &-\infty &\dots &-\infty \\0&0&-\infty &\dots &-\infty \\0&0&0&\dots &-\infty \\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\dots &0\end{bmatrix}}} In words, it means that each token can pay attention to itself, and every token before it, but not any after it. A non-masked attention module can be thought of as a masked attention module where the mask has all entries zero. As an example of an uncommon use of mask matrix, the XLNet considers all masks of the form P M causal P − 1 {\displaystyle PM_{\text{causal}}P^{-1}}, where P {\displaystyle P} is a random permutation matrix.[57] Encoder One encoder layer An encoder consists of an embedding layer, followed by multiple encoder layers. Each encoder layer consists of two major components: a self-attention mechanism and a feed-forward layer. It takes an input as a sequence of input vectors, applies the self-attention mechanism, to produce an intermediate sequence of vectors, then applies the feed-forward layer for each vector individually. Schematically, we have: given input vectors h 0 , h 1 , … combine them into a matrix H = [ h 0 h 1 ⋮ ] EncoderLayer ( H ) = [ FFN ( MultiheadedAttention ( H , H , H ) 0 ) FFN ( MultiheadedAttention ( H , H , H ) 1 ) ⋮ ] {\displaystyle {\begin{aligned}{\text{given input vectors }}&h_{0},h_{1},\dots \\{\text{combine them into a matrix }}H&={\begin{bmatrix}h_{0}\\h_{1}\\\vdots \end{bmatrix}}\\{\text{EncoderLayer}}(H)&={\begin{bmatrix}{\text{FFN}}({\text{MultiheadedAttention}}(H,H,H)_{0})\\{\text{FFN}}({\text{MultiheadedAttention}}(H,H,H)_{1})\\\vdots \end{bmatrix}}\\\end{aligned}}} where FFN {\displaystyle {\text{FFN}}} stands for "feed-forward network". We can more succinctly write it as EncoderLayer ( H ) = FFN ( MultiheadedAttention ( H , H , H ) ) {\displaystyle {\text{EncoderLayer}}(H)={\text{FFN}}({\text{MultiheadedAttention}}(H,H,H))}with the implicit convention that the FFN {\displaystyle {\text{FFN}}} is applied to each row of the matrix individually. The encoder layers are stacked. The first encoder layer takes the sequence of input vectors from the embedding layer, producing a sequence of vectors. This sequence of vectors is processed by the second encoder, and so on. The output from the final encoder layer is then used by the decoder. As the encoder processes the entire input all at once, every token can attend to every other token (all-to-all attention), so there is no need for causal masking. Decoder One decoder layer A decoder consists of an embedding layer, followed by multiple decoder layers, followed by an un-embedding layer. Each decoder consists of three major components: a causally masked self-attention mechanism, a cross-attention mechanism, and a feed-forward neural network. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. This mechanism can also be called the encoder-decoder attention.[1][54] Like the first encoder, the first decoder takes positional information and embeddings of the output sequence as its input, rather than encodings. The transformer must not use the current or future output to predict an output, so the output sequence must be partially masked to prevent this reverse information flow.[1] This allows for autoregressive text generation. For decoding, all-to-all attention is inappropriate, because a token cannot attend to tokens not yet generated. Thus, the self-attention module in the decoder is causally masked. In contrast, the cross-attention mechanism attends to the output vectors of the encoder, which is computed before the decoder starts decoding. Consequently, there is no need for masking in the cross-attention mechanism. Schematically, we have: H ′ = MaskedMultiheadedAttention ( H , H , H ) DecoderLayer ( H ) = FFN ( MultiheadedAttention ( H ′ , H E , H E ) ) {\displaystyle {\begin{aligned}H'&={\text{MaskedMultiheadedAttention}}(H,H,H)\\{\text{DecoderLayer}}(H)&={\text{FFN}}({\text{MultiheadedAttention}}(H',H^{E},H^{E}))\end{aligned}}}where H E {\displaystyle H^{E}} is the matrix with rows being the output vectors from the encoder. The last decoder is followed by a final un-embedding layer. to produce the output probabilities over the vocabulary. Then, one of the tokens is sampled according to the probability, and the decoder can be run again to produce the next token, etc, autoregressively generating output text. Adapted architectures Many large language models, since they do not need to predict a whole new sequence from an input sequence, only use the encoder or decoder of the original transformer architecture. Early GPT models are decoder-only models trained to predict the next token in a sequence.[58] BERT, another language model, only makes use of an encoder, and is trained to predict a randomly masked token in a sequence.[35] Full transformer architecture Sublayers (a) One encoder layer and one decoder layer. (b) Two encoder layers and two decoder layers. The sublayers are labelled as well. Each encoder layer contains 2 sublayers: the self-attention and the feedforward network. Each decoder layer contains 3 sublayers: the causally masked self-attention, the cross-attention, and the feedforward network. Transformer encoder with norm-first and norm-last Transformer decoder with norm-first and norm-last Block diagram for the full Transformer architecture Schematic object hierarchy for the full Transformer architecture, in object-oriented programming style The final points of detail are the residual connections and layer normalization (LayerNorm, or LN), which while conceptually unnecessary, are necessary for numerical stability and convergence. The residual connection, which is introduced to avoid vanishing gradient issues and stabilize the training process, can be expressed as follows: y = F(x) + x. The expression indicates that an output y is the sum of the transformation of input x (F(x)) and the input itself (x). Adding the input x can preserve the input information and avoid issues when the gradient of F(x) is close to zero. Similarly to how the feedforward network modules are applied individually to each vector, the LayerNorm is also applied individually to each vector. There are two common conventions in use: the post-LN and the pre-LN convention. In the post-LN convention, the output of each sublayer is L a y e r N o r m ( x + S u b l a y e r ( x ) ) {\displaystyle \mathrm {LayerNorm} (x+\mathrm {Sublayer} (x))}where S u b l a y e r ( x ) {\displaystyle \mathrm {Sublayer} (x)} is the function implemented by the sublayer itself. In the pre-LN convention, the output of each sublayer is x + S u b l a y e r ( L a y e r N o r m ( x ) ) {\displaystyle x+\mathrm {Sublayer} (\mathrm {LayerNorm} (x))}The original 2017 Transformer used the post-LN convention. It was difficult to train and required careful hyperparameter tuning and a "warm-up" in learning rate, where it starts small and gradually increases. The pre-LN convention, proposed several times in 2018,[59] was found to be easier to train, requiring no warm-up, leading to faster convergence.[46] Pseudocode The following is the pseudocode for a standard pre-LN encoder-decoder Transformer, adapted from[60] input: Encoder input t_e Decoder input t_d output: Array of probability distributions, with shape (decoder vocabulary size x length(decoder output sequence)) /* encoder */ z_e ← encoder.tokenizer(t_e) for each t in 1:length(z_e) do z_e[t] ← encoder.embedding(z_e[t]) + encoder.positional_embedding(t) for each l in 1:length(encoder.layers) do layer ← encoder.layers[l] /* first sublayer */ z_e_copy ← copy(z_e) for each t in 1:length(z_e) do z_e[t] ← layer.layer_norm(z_e[t]) z_e ← layer.multiheaded_attention(z_e, z_e, z_e) for each t in 1:length(z_e) do z_e[t] ← z_e[t] + z_e_copy[t] /* second sublayer */ z_e_copy ← copy(z_e) for each t in 1:length(z_e) do z_e[t] ← layer.layer_norm(z_e[t]) z_e ← layer.feedforward(z_e) for each t in 1:length(z_e) do z_e[t] ← z_e[t] + z_e_copy[t] for each t in 1:length(z_e) do z_e[t] ← encoder.final_layer_norm(z_e[t]) /* decoder */ z_d ← decoder.tokenizer(t_d) for each t in 1:length(z_d) do z_d[t] ← decoder.embedding(z_d[t]) + decoder.positional_embedding(t) for each l in 1:length(decoder.layers) do layer ← decoder.layers[l] /* first sublayer */ z_d_copy ← copy(z_d) for each t in 1:length(z_d) do z_d[t] ← layer.layer_norm(z_d[t]) z_d ← layer.masked_multiheaded_attention(z_d, z_d, z_d) for each t in 1:length(z_d) do z_d[t] ← z_d[t] + z_d_copy[t] /* second sublayer */ z_d_copy ← copy(z_d) for each t in 1:length(z_d) do z_d[t] ← layer.layer_norm(z_d[t]) z_d ← layer.multiheaded_attention(z_d, z_e, z_e) for each i in 1:length(z_d) do z_d[t] ← z_d[t] + z_d_copy[t] /* third sublayer */ z_d_copy ← copy(z_d) for each t in 1:length(z_d) do z_d[t] ← layer.layer_norm(z_d[t]) z_d ← layer.feedforward(z_d) for each t in 1:length(z_d) do z_d[t] ← z_d[t] + z_d_copy[t] z_d ← decoder.final_layer_norm(z_d) output_distributions ← [] for each t in 1:length(z_d) do output_distributions.append(decoder.unembed(z_d[t])) return output_distributions Terminology The Transformer architecture, being modular, allows variations. Several common variations are described here.[61] An "encoder-only" Transformer applies the encoder to map an input text into a sequence of vectors that represent the input text. This is usually used for text embedding and representation learning for downstream applications. BERT is encoder-only. They are less often used currently, as they were found to be not significantly better than training an encoder-decoder Transformer, then taking just the encoder.[51] A "decoder-only" Transformer is not literally decoder-only, since without an encoder, the cross-attention mechanism has nothing to attend to. Thus, the decoder layers in a decoder-only Transformer is composed of just two sublayers: the causally masked self-attention, and the feedforward network. This is usually used for text generation and instruction following. The models in the GPT series and Chinchilla series are decoder-only. An "encoder-decoder" Transformer is generally the same as the original Transformer, with 2 sublayers per encoder layer and 3 sublayers per decoder layer, etc. They might have minor architectural improvements, such as alternative activation functions, changing the location of normalization, etc. This is also usually used for text generation and instruction following. The models in the T5 series are encoder-decoder.[61] A "prefixLM" (prefix language model) is a decoder-only architecture, but with prefix masking, which is different from causal masking. Specifically, it has mask of the form[61]: Figure 3  M prefixLM = [ 0 − ∞ 0 M causal ] {\displaystyle M_{\text{prefixLM}}={\begin{bmatrix}\mathbf {0} &-\infty \\\mathbf {0} &M_{\text{causal}}\end{bmatrix}}}where the first columns correspond to the "prefix", and the subsequent columns correspond to the autoregressively generated text based on the prefix. They resemble encoder-decoder models, but has less "sparsity". Such models are rarely used, though they are cited as theoretical possibilities and benchmarked comparisons.[51] There are also mixed seq2seq models. For example, in 2020, Google Translate replaced the previous RNN-encoder–RNN-decoder model by a Transformer-encoder–RNN-decoder model, on the argument that an RNN-decoder runs much faster than Transformer-decoder when run autoregressively.[62] Subsequent work Alternative activation functions The original transformer uses ReLU activation function. Other activation functions were developed. The Llama series and PaLM used SwiGLU;[63] both GPT-1 and BERT[35] used GELU.[64] Alternative activation functions are often used in combination with Gated Linear Units in the feedforward module.[63] Alternative normalizations The normalization used in the Transformer can be different from LayerNorm. One example is RMSNorm[65] which is used in the Llama series. Other examples include CapsuleNorm[66] ScaleNorm,[67] or FixNorm.[67] Alternative positional encodings Transformers may use other positional encoding methods than sinusoidal.[68] The original Transformer paper reported using a learned positional encoding,[69] but finding it not superior to the sinusoidal one.[1] Later,[70] found that causal masking itself provides enough signal to a Transformer decoder that it can learn to implicitly perform absolute positional encoding without the positional encoding module. RoPE RoPE (rotary positional embedding),[71] is best explained by considering a list of 2-dimensional vectors [ ( x 1 ( 1 ) , x 1 ( 2 ) ) , ( x 2 ( 1 ) , x 2 ( 2 ) ) , ( x 3 ( 1 ) , x 3 ( 2 ) ) , . . . ] {\displaystyle [(x_{1}^{(1)},x_{1}^{(2)}),(x_{2}^{(1)},x_{2}^{(2)}),(x_{3}^{(1)},x_{3}^{(2)}),...]}. Now pick some angle θ {\displaystyle \theta }. Then RoPE encoding is RoPE ( x m ( 1 ) , x m ( 2 ) , m ) = ( cos ⁡ m θ − sin ⁡ m θ sin ⁡ m θ cos ⁡ m θ ) ( x m ( 1 ) x m ( 2 ) ) = ( x m ( 1 ) cos ⁡ m θ − x m ( 2 ) sin ⁡ m θ x m ( 2 ) cos ⁡ m θ + x m ( 1 ) sin ⁡ m θ ) {\displaystyle {\text{RoPE}}{\big (}x_{m}^{(1)},x_{m}^{(2)},m{\big )}={\begin{pmatrix}\cos m\theta &-\sin m\theta \\\sin m\theta &\cos m\theta \end{pmatrix}}{\begin{pmatrix}x_{m}^{(1)}\\x_{m}^{(2)}\\\end{pmatrix}}={\begin{pmatrix}x_{m}^{(1)}\cos m\theta -x_{m}^{(2)}\sin m\theta \\x_{m}^{(2)}\cos m\theta +x_{m}^{(1)}\sin m\theta \\\end{pmatrix}}}Equivalently, if we write the 2-dimensional vectors as complex numbers z m := x m ( 1 ) + i x m ( 2 ) {\displaystyle z_{m}:=x_{m}^{(1)}+ix_{m}^{(2)}}, then RoPE encoding is just multiplication by an angle: RoPE ( z m , m ) = e i m θ z m {\displaystyle {\text{RoPE}}{\big (}z_{m},m{\big )}=e^{im\theta }z_{m}}For a list of 2 n {\displaystyle 2n}-dimensional vectors, a RoPE encoder is defined by a sequence of angles θ ( 1 ) , . . . , θ ( n ) {\displaystyle \theta ^{(1)},...,\theta ^{(n)}}. Then the RoPE encoding is applied to each pair of coordinates. The benefit of RoPE is that the dot-product between two vectors depends on their relative location only: RoPE ( x , m ) T RoPE ( y , n ) = RoPE ( x , m + k ) T RoPE ( y , n + k ) {\displaystyle {\text{RoPE}}{\big (}x,m{\big )}^{T}{\text{RoPE}}{\big (}y,n{\big )}={\text{RoPE}}{\big (}x,m+k{\big )}^{T}{\text{RoPE}}{\big (}y,n+k{\big )}} for any integer k {\displaystyle k}. ALiBi ALiBi (Attention with Linear Biases)[72] is not a replacement for the positional encoder on the original transformer. Instead, it is an additional positional encoder that is directly plugged into the attention mechanism. Specifically, the ALiBi attention mechanism is Attention ( Q , K , V ) = softmax ( Q K T d k + s B ) V {\displaystyle {\begin{aligned}{\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{\mathrm {T} }}{\sqrt {d_{k}}}}+sB\right)V\end{aligned}}}Here, s {\displaystyle s} is a real number ("scalar"), and B {\displaystyle B} is the linear bias matrix defined by B = ( 0 1 2 3 ⋯ − 1 0 1 2 ⋯ − 2 − 1 0 1 ⋯ − 3 − 2 − 1 0 ⋯ ⋮ ⋮ ⋮ ⋮ ⋱ ) {\displaystyle B={\begin{pmatrix}0&1&2&3&\cdots \\-1&0&1&2&\cdots \\-2&-1&0&1&\cdots \\-3&-2&-1&0&\cdots \\\vdots &\vdots &\vdots &\vdots &\ddots \\\end{pmatrix}}}in other words, B i , j = j − i {\displaystyle B_{i,j}=j-i}. The idea being that the linear bias matrix is a softened mask. Just as 0 {\displaystyle 0} represent full attention paid, and − ∞ {\displaystyle -\infty } represents no attention paid, the linear bias matrix increases attention paid in one direction and decreases attention paid in the other direction. ALiBi allows pretraining on short context windows, then fine-tuning on longer context windows. Since it is directly plugged into the attention mechanism, it can be combined with any positional encoder that is plugged into the "bottom" of the entire network (which is where the sinusoidal encoder on the original transformer, as well as RoPE and many others, are located). Relative Position Encodings Relative Position Encodings[73] is similar to ALiBi, but more generic: Attention ( Q , K , V ) = softmax ( Q K T d k + B ) V {\displaystyle {\begin{aligned}{\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{\mathrm {T} }}{\sqrt {d_{k}}}}+B\right)V\end{aligned}}}where B {\displaystyle B} is a Toeplitz matrix, that is, B i , j = B i ′ , j ′ {\displaystyle B_{i,j}=B_{i',j'}} whenever i − j = i ′ − j ′ {\displaystyle i-j=i'-j'}. This is contrasted with the original sinusoidal positional encoding, which is an "absolute positional encoding".[74] Efficient implementation The transformer model has been implemented in standard deep learning frameworks such as TensorFlow and PyTorch. Transformers is a library produced by Hugging Face that supplies transformer-based architectures and pretrained models.[11] KV caching When an autoregressive transformer is used for inference, such as generating text, the query vector is different at each step, but the already-computed key and value vectors are always the same. The KV caching method saves the computed key and value vectors at each attention block, so that they are not recomputed at each new token. PagedAttention applies memory paging to KV caching.[75][76][77] If a transformer is used with a baked-in prompt, such as ["You are a customer support agent..."], then the key and value vectors can be computed for the prompt, and saved on disk. The saving in compute is significant when the model is used for many short interactions, such as in online chatbots. FlashAttention FlashAttention[78] is an algorithm that implements the transformer attention mechanism efficiently on a GPU. It is a communication-avoiding algorithm that performs matrix multiplications in blocks, such that each block fits within the cache of a GPU, and by careful management of the blocks it minimizes data copying between GPU caches (as data movement is slow). See the page on softmax for details. An improved version, FlashAttention-2,[79][80][81] was developed to cater to the rising demand for language models capable of handling longer context lengths. It offers enhancements in work partitioning and parallelism, enabling it to achieve up to 230 TFLOPs/s on A100 GPUs (FP16/BF16), a 2x speed increase over the original FlashAttention. Key advancements in FlashAttention-2 include the reduction of non-matmul FLOPs, improved parallelism over the sequence length dimension, better work partitioning between GPU warps, and added support for head dimensions up to 256 and multi-query attention (MQA) and grouped-query attention (GQA).[82] Benchmarks revealed FlashAttention-2 to be up to 2x faster than FlashAttention and up to 9x faster than a standard attention implementation in PyTorch. Future developments include optimization for new hardware like H100 GPUs and new data types like FP8. Multi-Query Attention Comparison between several different forms of attention mechanism and the amount of KV caching necessary for each Multi-Query Attention changes the multiheaded attention mechanism.[83] Whereas normally, MultiheadedAttention ( Q , K , V ) = Concat i ∈ [ n heads ] ( Attention ( X W i Q , X W i K , X W i V ) ) W O {\displaystyle {\text{MultiheadedAttention}}(Q,K,V)={\text{Concat}}_{i\in [n_{\text{heads}}]}\left({\text{Attention}}(XW_{i}^{Q},XW_{i}^{K},XW_{i}^{V})\right)W^{O}}with Multi-Query Attention, there is just one W K , W V {\displaystyle W^{K},W^{V}}, thus: MultiQueryAttention ( Q , K , V ) = Concat i ∈ [ n heads ] ( Attention ( X W i Q , X W K , X W V ) ) W O {\displaystyle {\text{MultiQueryAttention}}(Q,K,V)={\text{Concat}}_{i\in [n_{\text{heads}}]}\left({\text{Attention}}(XW_{i}^{Q},XW^{K},XW^{V})\right)W^{O}} This has a neutral effect on model quality and training speed, but increases inference speed. More generally, grouped-query attention (GQA) partitions attention heads into groups, each of which shares the key-value pair. MQA is GQA with one group, while standard multiheaded attention is GQA with the maximal number of groups.[84] The architecture of V2, showing both MLA and a variant of mixture of experts[85]: Figure 2  Multihead Latent Attention (MLA) is a low-rank approximation to standard MHA. Specifically, each hidden vector, before entering the attention mechanism, is first projected to two low-dimensional spaces ("latent space"), one for query and one for key-value (KV vector). This design minimizes the KV cache, as only the low-dimensional KV vector needs to be cached.[85] Speculative decoding Speculative decoding[86][87] is a method to accelerate token decoding. Similarly to speculative execution in CPUs, future tokens are computed quickly, then verified. If the quickly computed tokens are incorrect, they are discarded and computed slowly. The key factor in speculative decoding is that a Transformer decoder can verify faster than it can decode, in the following sense. Suppose we have two transformer models like GPT-3 and GPT-3-small, both with a context window size of 512. To generate an entire context window autoregressively with greedy decoding with GPT-3, it must be run for 512 times, each time generating a token x 1 , x 2 , . . . , x 512 {\displaystyle x_{1},x_{2},...,x_{512}}, taking time 512 T GPT-3 {\displaystyle 512T_{\text{GPT-3}}}. However, if we had some educated guess for the values of these tokens, we could verify all of them in parallel, in one run of the model, by checking that each x t {\displaystyle x_{t}} is indeed the token with the largest log-likelihood in the t {\displaystyle t}-th output. In speculative decoding, a smaller model or some other simple heuristic is used to generate a few speculative tokens that are subsequently verified by the larger model. For example, suppose we use GPT-3-small to generate four speculative tokens: x ~ 1 , x ~ 2 , x ~ 3 , x ~ 4 {\displaystyle {\tilde {x}}_{1},{\tilde {x}}_{2},{\tilde {x}}_{3},{\tilde {x}}_{4}}. This only takes 4 T GPT-3-small {\displaystyle 4T_{\text{GPT-3-small}}}. These tokens are then run through the larger GPT-3 in one go. Suppose that x ~ 1 {\displaystyle {\tilde {x}}_{1}} and x ~ 2 {\displaystyle {\tilde {x}}_{2}} are verified by GPT-3 as what it would have picked, then those are kept, but x ~ 3 {\displaystyle {\tilde {x}}_{3}} is not, so x ~ 3 , x ~ 4 {\displaystyle {\tilde {x}}_{3},{\tilde {x}}_{4}} are discarded, and GPT-3 is run on those. This would take 4 T GPT-3-small + 3 T GPT-3 {\displaystyle 4T_{\text{GPT-3-small}}+3T_{\text{GPT-3}}}, which might be shorter than 4 T GPT-3 {\displaystyle 4T_{\text{GPT-3}}}. For non-greedy decoding, similar ideas apply, except the speculative tokens are accepted or rejected stochastically, in a way that guarantees the final output distribution is the same as if speculative decoding was not used.[86][88] Multi-token prediction In Multi-Token Prediction, a single forward pass creates a final embedding vector, which then is un-embedded into a token probability. However, that vector can then be further processed by another Transformer block to predict the next token, and so on for arbitrarily many steps into the future. This trades off accuracy for speed, since each new token costs just one more Transformer block, rather than the entire stack.[89][90] Sub-quadratic transformers Training transformer-based architectures can be expensive, especially for long inputs.[91] Many methods have been developed to attempt to address the issue. In the image domain, Swin Transformer is an efficient architecture that performs attention inside shifting windows.[92] In the audio domain, SepTr decouples the attention in time and frequency domains.[93] Long Range Arena (2020)[94] is a standard benchmark for comparing the behavior of transformer architectures over long inputs. Alternative attention graphs The standard attention graph is either all-to-all or causal, both of which scales as O ( N 2 ) {\displaystyle O(N^{2})} where N {\displaystyle N} is the number of tokens in a sequence. Reformer (2020)[91][95] reduces the computational load from O ( N 2 ) {\displaystyle O(N^{2})} to O ( N ln ⁡ N ) {\displaystyle O(N\ln N)} by using locality-sensitive hashing and reversible layers.[96] Sparse attention[97] uses attention graphs that grows slower than O ( N 2 ) {\displaystyle O(N^{2})}. For example, BigBird (2020)[98] uses random small-world networks which grows as O ( N ) {\displaystyle O(N)}. Ordinary transformers require a memory size that is quadratic in the size of the context window. Attention-free transformers[99] reduce this to a linear dependence while still retaining the advantages of a transformer by linking the key to the value. Random Feature Attention Random Feature Attention (2021)[100] uses Fourier random features: φ ( x ) = 1 D [ cos ⁡ ⟨ w 1 , x ⟩ , sin ⁡ ⟨ w 1 , x ⟩ , ⋯ cos ⁡ ⟨ w D , x ⟩ , sin ⁡ ⟨ w D , x ⟩ ] T {\displaystyle \varphi (x)={\frac {1}{\sqrt {D}}}[\cos \langle w_{1},x\rangle ,\sin \langle w_{1},x\rangle ,\cdots \cos \langle w_{D},x\rangle ,\sin \langle w_{D},x\rangle ]^{T}}where w 1 , . . . , w D {\displaystyle w_{1},...,w_{D}} are independent samples from the normal distribution N ( 0 , σ 2 I ) {\displaystyle N(0,\sigma ^{2}I)}. This choice of parameters satisfy E [ ⟨ φ ( x ) , φ ( y ) ⟩ ] = e − ‖ x − y ‖ 2 2 σ 2 {\displaystyle \mathbb {E} [\langle \varphi (x),\varphi (y)\rangle ]=e^{-{\frac {\|x-y\|^{2}}{2\sigma ^{2}}}}}, or e ⟨ x , y ⟩ / σ 2 = E [ ⟨ e ‖ x ‖ 2 / 2 σ 2 φ ( x ) , e ‖ y ‖ 2 / 2 σ 2 φ ( y ) ⟩ ] ≈ ⟨ e ‖ x ‖ 2 / 2 σ 2 φ ( x ) , e ‖ y ‖ 2 / 2 σ 2 φ ( y ) ⟩ {\displaystyle e^{\langle x,y\rangle /\sigma ^{2}}=\mathbb {E} [\langle e^{\|x\|^{2}/2\sigma ^{2}}\varphi (x),e^{\|y\|^{2}/2\sigma ^{2}}\varphi (y)\rangle ]\approx \langle e^{\|x\|^{2}/2\sigma ^{2}}\varphi (x),e^{\|y\|^{2}/2\sigma ^{2}}\varphi (y)\rangle }Consequently, the one-headed attention, with one query, can be written as Attention ( q , K , V ) = softmax ( q K T d k ) V ≈ φ ( q ) T ∑ i e ‖ k i ‖ 2 / 2 σ 2 φ ( k i ) v i T φ ( q ) T ∑ i e ‖ k i ‖ 2 / 2 σ 2 φ ( k i ) {\displaystyle {\text{Attention}}(q,K,V)={\text{softmax}}\left({\frac {qK^{\mathrm {T} }}{\sqrt {d_{k}}}}\right)V\approx {\frac {\varphi (q)^{T}\sum _{i}e^{\|k_{i}\|^{2}/2\sigma ^{2}}\varphi (k_{i})v_{i}^{T}}{\varphi (q)^{T}\sum _{i}e^{\|k_{i}\|^{2}/2\sigma ^{2}}\varphi (k_{i})}}}where σ = d K 1 / 4 {\displaystyle \sigma =d_{K}^{1/4}}. Similarly for multiple queries, and for multiheaded attention. This approximation can be computed in linear time, as we can compute the matrix φ ( k i ) v i T {\displaystyle \varphi (k_{i})v_{i}^{T}} first, then multiply it with the query. In essence, we have managed to obtain a more precise version of Attention ( Q , K , V ) = softmax ( Q K T d k ) V ≈ Q ( K T V / d k ) {\displaystyle {\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{\mathrm {T} }}{\sqrt {d_{k}}}}\right)V\approx Q(K^{T}V/{\sqrt {d_{k}}})}Performer (2022)[101] uses the same Random Feature Attention, but w 1 , . . . , w D {\displaystyle w_{1},...,w_{D}} are first independently sampled from the normal distribution N ( 0 , σ 2 I ) {\displaystyle N(0,\sigma ^{2}I)}, then they are Gram-Schmidt processed. Multimodality Transformers can also be used/adapted for modalities (input or output) beyond just text, usually by finding a way to "tokenize" the modality. Multimodal models can either be trained from scratch, or by finetuning. A 2022 study found that Transformers pretrained only on natural language can be finetuned on only 0.03% of parameters and become competitive with LSTMs on a variety of logical and visual tasks, demonstrating transfer learning.[102] The LLaVA was a vision-language model composed of a language model (Vicuna-13B)[103] and a vision model (ViT-L/14), connected by a linear layer. Only the linear layer is finetuned.[104] Vision transformers[41] adapt the transformer to computer vision by breaking down input images as a series of patches, turning them into vectors, and treating them like tokens in a standard transformer. Conformer[42] and later Whisper[105] follow the same pattern for speech recognition, first turning the speech signal into a spectrogram, which is then treated like an image, i.e. broken down into a series of patches, turned into vectors and treated like tokens in a standard transformer. Perceivers[106][107] are a variant of Transformers designed for multimodality. For image generation, notable architectures are DALL-E 1 (2021), Parti (2022),[108] Phenaki (2023),[109] and Muse (2023).[110] Unlike later models, DALL-E is not a diffusion model. Instead, it uses a decoder-only Transformer that autoregressively generates a text, followed by the token representation of an image, which is then converted by a variational autoencoder to an image.[111] Parti is an encoder-decoder Transformer, where the encoder processes a text prompt, and the decoder generates a token representation of an image.[112] Muse is an encoder-only Transformer that is trained to predict masked image tokens from unmasked image tokens. During generation, all input tokens are masked, and the highest-confidence predictions are included for the next iteration, until all tokens are predicted.[110] Phenaki is a text-to-video model. It is a bidirectional masked transformer conditioned on pre-computed text tokens. The generated tokens are then decoded to a video.[109] Applications The transformer has had great success in natural language processing (NLP). Many large language models such as GPT-2, GPT-3, GPT-4, Gemini, AlbertAGPT, Claude, BERT, Grok, XLNet, RoBERTa and ChatGPT demonstrate the ability of transformers to perform a wide variety of NLP-related subtasks and their related real-world applications, including: machine translation time series prediction document summarization document generation named entity recognition (NER)[113] writing computer code based on requirements expressed in natural language. speech-to-text Beyond traditional NLP, the transformer architecture has had success in other applications, such as: biological sequence analysis video understanding protein folding (such as AlphaFold) evaluating chess board positions. Using static evaluation alone (that is, with no Minimax search) transformer achieved an Elo of 2895, putting it at grandmaster level.[10] See also seq2seq – Family of machine learning approaches Perceiver – Variant of Transformer designed for multimodal data Vision transformer – Machine learning model for vision processing Large language model – Type of machine learning model BERT (language model) – Series of language models developed by Google AI Generative pre-trained transformer – Type of large language model T5 (language model) – Series of large language models developed by Google AI Notes Gated recurrent units (2014) further reduced its complexity. Some architectures, such as RWKV or state space models, avoid the issue. References Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017). "Attention is All you Need" (PDF). Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. Hochreiter, Sepp; Schmidhuber, Jürgen (1 November 1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. ISSN 0899-7667. PMID 9377276. S2CID 1915014. "Better Language Models and Their Implications". OpenAI. 2019-02-14. Archived from the original on 2020-12-19. Retrieved 2019-08-25. Bahdanau; Cho, Kyunghyun; Bengio, Yoshua (September 1, 2014). "Neural Machine Translation by Jointly Learning to Align and Translate". arXiv:1409.0473 [cs.CL]. Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (August 17, 2015). "Effective Approaches to Attention-based Neural Machine Translation". arXiv:1508.04025 [cs.CL]. Chen, Lili; Lu, Kevin; Rajeswaran, Aravind; Lee, Kimin; Grover, Aditya; Laskin, Michael; Abbeel, Pieter; Srinivas, Aravind; Mordatch, Igor (2021-06-24), Decision Transformer: Reinforcement Learning via Sequence Modeling, arXiv:2106.01345 Parisotto, Emilio; Song, Francis; Rae, Jack; Pascanu, Razvan; Gulcehre, Caglar; Jayakumar, Siddhant; Jaderberg, Max; Kaufman, Raphaël Lopez; Clark, Aidan; Noury, Seb; Botvinick, Matthew; Heess, Nicolas; Hadsell, Raia (2020-11-21). "Stabilizing Transformers for Reinforcement Learning". Proceedings of the 37th International Conference on Machine Learning. PMLR: 7487–7498. Radford, Alec; Jong Wook Kim; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS]. Monastirsky, Maxim; Azulay, Osher; Sintov, Avishai (February 2023). "Learning to Throw With a Handful of Samples Using Decision Transformers". IEEE Robotics and Automation Letters. 8 (2): 576–583. doi:10.1109/LRA.2022.3229266. ISSN 2377-3766. Ruoss, Anian; Delétang, Grégoire; Medapati, Sourabh; Grau-Moya, Jordi; Wenliang, Li; Catt, Elliot; Reid, John; Genewein, Tim (2024-02-07). "Grandmaster-Level Chess Without Search". arXiv:2402.04494v1 [cs.LG]. Wolf, Thomas; Debut, Lysandre; Sanh, Victor; Chaumond, Julien; Delangue, Clement; Moi, Anthony; Cistac, Pierric; Rault, Tim; Louf, Remi; Funtowicz, Morgan; Davison, Joe; Shleifer, Sam; von Platen, Patrick; Ma, Clara; Jernite, Yacine; Plu, Julien; Xu, Canwen; Le Scao, Teven; Gugger, Sylvain; Drame, Mariama; Lhoest, Quentin; Rush, Alexander (2020). "Transformers: State-of-the-Art Natural Language Processing". Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 38–45. doi:10.18653/v1/2020.emnlp-demos.6. S2CID 208117506. "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. 2 November 2018. Archived from the original on 2021-01-13. Retrieved 2019-08-25. Feldman, J. A.; Ballard, D. H. (1982-07-01). "Connectionist models and their properties". Cognitive Science. 6 (3): 205–254. doi:10.1016/S0364-0213(82)80001-3. ISSN 0364-0213. Rumelhart, David E.; McClelland, James L.; Hinton, Geoffrey E. (1987-07-29). Parallel Distributed Processing, Volume 1: Explorations in the Microstructure of Cognition: Foundations, Chapter 2 (PDF). Cambridge, Mass: Bradford Books. ISBN 978-0-262-68053-0. Giles, C. Lee; Maxwell, Tom (1987-12-01). "Learning, invariance, and generalization in high-order neural networks". Applied Optics. 26 (23): 4972–4978. doi:10.1364/AO.26.004972. ISSN 0003-6935. PMID 20523475. Schmidhuber, Jürgen (1992). "Learning to control fast-weight memories: an alternative to recurrent nets" (PDF). Neural Computation. 4 (1): 131–139. doi:10.1162/neco.1992.4.1.131. S2CID 16683347. Christoph von der Malsburg: The correlation theory of brain function. Internal Report 81-2, MPI Biophysical Chemistry, 1981. http://cogprints.org/1380/1/vdM_correlation.pdf See Reprint in Models of Neural Networks II, chapter 2, pages 95–119. Springer, Berlin, 1994. Jerome A. Feldman, "Dynamic connections in neural networks," Biological Cybernetics, vol. 46, no. 1, pp. 27–39, Dec. 1982. Hinton, Geoffrey E.; Plaut, David C. (1987). "Using Fast Weights to Deblur Old Memories". Proceedings of the Annual Meeting of the Cognitive Science Society. 9. Katharopoulos, Angelos; Vyas, Apoorv; Pappas, Nikolaos; Fleuret, François (2020). "Transformers are RNNs: Fast autoregressive Transformers with linear attention". ICML 2020. PMLR. pp. 5156–5165. Schlag, Imanol; Irie, Kazuki; Schmidhuber, Jürgen (2021). "Linear Transformers Are Secretly Fast Weight Programmers". ICML 2021. Springer. pp. 9355–9366. Cho, Kyunghyun; van Merriënboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (October 2014). "Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation". In Moschitti, Alessandro; Pang, Bo; Daelemans, Walter (eds.). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. pp. 1724–1734. arXiv:1406.1078. doi:10.3115/v1/D14-1179. Sutskever, Ilya; Vinyals, Oriol; Le, Quoc Viet (14 Dec 2014). "Sequence to sequence learning with neural networks". arXiv:1409.3215 [cs.CL]. [first version posted to arXiv on 10 Sep 2014] Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua (2014). "Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling". arXiv:1412.3555 [cs.NE]. Gruber, N.; Jockisch, A. (2020), "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?", Frontiers in Artificial Intelligence, 3: 40, doi:10.3389/frai.2020.00040, PMC 7861254, PMID 33733157, S2CID 220252321 Sutskever, Ilya; Vinyals, Oriol; Le, Quoc V (2014). "Sequence to Sequence Learning with Neural Networks". Advances in Neural Information Processing Systems. 27. Curran Associates, Inc. arXiv:1409.3215. Luong, Minh-Thang; Pham, Hieu; Manning, Christopher D. (2015). "Effective Approaches to Attention-based Neural Machine Translation". arXiv:1508.04025 [cs.CL]. Wu, Yonghui; et al. (2016-09-01). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL]. Lewis-Kraus, Gideon (2016-12-14). "The Great A.I. Awakening". The New York Times. ISSN 0362-4331. Archived from the original on 24 May 2023. Retrieved 2023-06-22. Parikh, Ankur P.; Täckström, Oscar; Das, Dipanjan; Uszkoreit, Jakob (2016-09-25). "A Decomposable Attention Model for Natural Language Inference". arXiv:1606.01933 [cs.CL]. Levy, Steven. "8 Google Employees Invented Modern AI. Here's the Inside Story". Wired. ISSN 1059-1028. Archived from the original on 20 Mar 2024. Retrieved 2024-08-06. Cheng, Jianpeng; Dong, Li; Lapata, Mirella (November 2016). "Long Short-Term Memory-Networks for Machine Reading". In Su, Jian; Duh, Kevin; Carreras, Xavier (eds.). Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics. pp. 551–561. doi:10.18653/v1/D16-1053. Peng, Bo; Alcaide, Eric; Anthony, Quentin; Albalak, Alon; Arcadinho, Samuel; Biderman, Stella; Cao, Huanqi; Cheng, Xin; Chung, Michael (2023-12-10), RWKV: Reinventing RNNs for the Transformer Era, arXiv:2305.13048 Marche, Stephen (2024-08-23). "Was Linguistic A.I. Created by Accident?". The New Yorker. ISSN 0028-792X. Retrieved 2024-08-27. Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL]. "Google: BERT now used on almost every English query". Search Engine Land. 2020-10-15. Retrieved 2020-11-24. "Recent Advances in Google Translate". research.google. Retrieved 2024-05-08. "The inside story of how ChatGPT was built from the people who made it". MIT Technology Review. Retrieved 2024-08-06. "Improving language understanding with unsupervised learning". openai.com. June 11, 2018. Archived from the original on 2023-03-18. Retrieved 2023-03-18. finetune-transformer-lm, OpenAI, June 11, 2018, retrieved 2023-05-01 Dosovitskiy, Alexey; Beyer, Lucas; Kolesnikov, Alexander; Weissenborn, Dirk; Zhai, Xiaohua; Unterthiner, Thomas; Dehghani, Mostafa; Minderer, Matthias; Heigold, Georg; Gelly, Sylvain; Uszkoreit, Jakob (2021-06-03). "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv:2010.11929 [cs.CV]. Gulati, Anmol; Qin, James; Chiu, Chung-Cheng; Parmar, Niki; Zhang, Yu; Yu, Jiahui; Han, Wei; Wang, Shibo; Zhang, Zhengdong; Wu, Yonghui; Pang, Ruoming (2020). "Conformer: Convolution-augmented Transformer for Speech Recognition". arXiv:2005.08100 [eess.AS]. Choromanski, Krzysztof; Likhosherstov, Valerii; Dohan, David; Song, Xingyou; Gane, Andreea; Sarlos, Tamas; Hawkins, Peter; Davis, Jared; Mohiuddin, Afroz (2022-11-19), Rethinking Attention with Performers, arXiv:2009.14794 Liu, Zhuang; Mao, Hanzi; Wu, Chao-Yuan; Feichtenhofer, Christoph; Darrell, Trevor; Xie, Saining (2022). A ConvNet for the 2020s. Conference on Computer Vision and Pattern Recognition. pp. 11976–11986. Esser, Patrick; Kulal, Sumith; Blattmann, Andreas; Entezari, Rahim; Müller, Jonas; Saini, Harry; Levi, Yam; Lorenz, Dominik; Sauer, Axel (2024-03-05), Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, arXiv:2403.03206 Xiong, Ruibin; Yang, Yunchang; He, Di; Zheng, Kai; Zheng, Shuxin; Xing, Chen; Zhang, Huishuai; Lan, Yanyan; Wang, Liwei; Liu, Tie-Yan (2020-06-29). "On Layer Normalization in the Transformer Architecture". arXiv:2002.04745 [cs.LG]. Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020-01-01). "Exploring the limits of transfer learning with a unified text-to-text transformer". The Journal of Machine Learning Research. 21 (1): 140:5485–140:5551. arXiv:1910.10683. ISSN 1532-4435. Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2019). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". arXiv:1910.10683 [cs.LG]. "Masked language modeling". huggingface.co. Retrieved 2023-10-05. "Causal language modeling". huggingface.co. Retrieved 2023-10-05. Tay, Yi; Dehghani, Mostafa; Tran, Vinh Q.; Garcia, Xavier; Wei, Jason; Wang, Xuezhi; Chung, Hyung Won; Shakeri, Siamak; Bahri, Dara (2023-02-28), UL2: Unifying Language Learning Paradigms, arXiv:2205.05131 Press, Ofir; Wolf, Lior (2017-02-21), Using the Output Embedding to Improve Language Models, arXiv:1608.05859 Lintz, Nathan (2016-04-18). "Sequence Modeling with Neural Networks (Part 2): Attention Models". Indico. Archived from the original on 2020-10-21. Retrieved 2019-10-15. Alammar, Jay. "The Illustrated Transformer". jalammar.github.io. Archived from the original on 2020-10-18. Retrieved 2019-10-15. Team, Keras. "Keras documentation: GPT2Backbone model". keras.io. Retrieved 2024-08-08. Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. (August 2019). "What Does BERT Look at? An Analysis of BERT's Attention". Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Florence, Italy: Association for Computational Linguistics: 276–286. arXiv:1906.04341. doi:10.18653/v1/W19-4828. Archived from the original on 2020-10-21. Retrieved 2020-05-20. Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Russ R; Le, Quoc V (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". Advances in Neural Information Processing Systems. 32. Curran Associates, Inc. arXiv:1906.08237. Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021. Wang, Qiang; Li, Bei; Xiao, Tong; Zhu, Jingbo; Li, Changliang; Wong, Derek F.; Chao, Lidia S. (2019-06-04), Learning Deep Transformer Models for Machine Translation, arXiv:1906.01787 Phuong, Mary; Hutter, Marcus (2022-07-19), Formal Algorithms for Transformers, arXiv:2207.09238 Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research. 21 (140): 1–67. arXiv:1910.10683. ISSN 1533-7928. "Recent Advances in Google Translate". Google Research. June 8, 2020. Archived from the original on 4 Jul 2024. Retrieved 2024-08-07. Shazeer, Noam (2020-02-01). "GLU Variants Improve Transformer". arXiv:2002.05202 [cs.LG]. Hendrycks, Dan; Gimpel, Kevin (2016-06-27). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415v5 [cs.LG]. Zhang, Biao; Sennrich, Rico (2019). "Root Mean Square Layer Normalization". Advances in Neural Information Processing Systems. 32. Curran Associates, Inc. arXiv:1910.07467. Tembine, Hamidou, Manzoor Ahmed Khan, and Issa Bamia. 2024. "Mean-Field-Type Transformers" Mathematics 12, no. 22: 3506. https://doi.org/10.3390/math12223506 Nguyen, Toan Q.; Salazar, Julian (2019-11-02). Niehues, Jan; Cattoni, Rolando; Stüker, Sebastian; Negri, Matteo; Turchi, Marco; Ha, Thanh-Le; Salesky, Elizabeth; Sanabria, Ramon; Barrault, Loic (eds.). "Transformers without Tears: Improving the Normalization of Self-Attention". Proceedings of the 16th International Conference on Spoken Language Translation. Hong Kong: Association for Computational Linguistics. arXiv:1910.05895. doi:10.5281/zenodo.3525484. Dufter, Philipp; Schmitt, Martin; Schütze, Hinrich (2022-06-06). "Position Information in Transformers: An Overview". Computational Linguistics. 48 (3): 733–763. arXiv:2102.11090. doi:10.1162/coli_a_00445. ISSN 0891-2017. S2CID 231986066. Gehring, Jonas; Auli, Michael; Grangier, David; Yarats, Denis; Dauphin, Yann N. (2017-07-17). "Convolutional Sequence to Sequence Learning". Proceedings of the 34th International Conference on Machine Learning. PMLR: 1243–1252. Haviv, Adi; Ram, Ori; Press, Ofir; Izsak, Peter; Levy, Omer (2022-12-05), Transformer Language Models without Positional Encodings Still Learn Positional Information, arXiv:2203.16634 Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 [cs.CL]. Press, Ofir; Smith, Noah A.; Lewis, Mike (2021-08-01). "Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation". arXiv:2108.12409 [cs.CL]. Shaw, Peter; Uszkoreit, Jakob; Vaswani, Ashish (2018). "Self-Attention with Relative Position Representations". arXiv:1803.02155 [cs.CL]. Ke, Guolin; He, Di; Liu, Tie-Yan (2021-03-15), Rethinking Positional Encoding in Language Pre-training, arXiv:2006.15595 Kwon, Woosuk; Li, Zhuohan; Zhuang, Siyuan; Sheng, Ying; Zheng, Lianmin; Yu, Cody Hao; Gonzalez, Joseph; Zhang, Hao; Stoica, Ion (2023-10-23). "Efficient Memory Management for Large Language Model Serving with PagedAttention". Proceedings of the 29th Symposium on Operating Systems Principles. SOSP '23. New York, NY, USA: Association for Computing Machinery. pp. 611–626. arXiv:2309.06180. doi:10.1145/3600006.3613165. ISBN 979-8-4007-0229-7. vllm-project/vllm, vLLM, 2024-06-20, retrieved 2024-06-20 Contribution), Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, and Ion Stoica (* Equal (2023-06-20). "vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention". vLLM Blog. Retrieved 2024-06-20. Dao, Tri; Fu, Dan; Ermon, Stefano; Rudra, Atri; Ré, Christopher (2022-12-06). "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness". Advances in Neural Information Processing Systems. 35: 16344–16359. arXiv:2205.14135. "Stanford CRFM". crfm.stanford.edu. Retrieved 2023-07-18. "FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning". Princeton NLP. 2023-06-17. Retrieved 2023-07-18. "Introducing Together AI Chief Scientist Tri Dao, as he releases FlashAttention-2 to speed up model training and inference". TOGETHER. Retrieved 2023-07-18. Ainslie, Joshua; Lee-Thorp, James; de Jong, Michiel; Zemlyanskiy, Yury; Lebrón, Federico; Sanghai, Sumit (2023-12-23). "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints". arXiv:2305.13245 [cs.CL]. Chowdhery, Aakanksha; Narang, Sharan; Devlin, Jacob; Bosma, Maarten; Mishra, Gaurav; Roberts, Adam; Barham, Paul; Chung, Hyung Won; Sutton, Charles; Gehrmann, Sebastian; Schuh, Parker; Shi, Kensen; Tsvyashchenko, Sasha; Maynez, Joshua; Rao, Abhishek (2022-04-01). "PaLM: Scaling Language Modeling with Pathways". arXiv:2204.02311 [cs.CL]. Ainslie, Joshua; Lee-Thorp, James; de Jong, Michiel; Zemlyanskiy, Yury; Lebrón, Federico; Sanghai, Sumit (2023-12-23), GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, arXiv:2305.13245 DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong (19 June 2024), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, arXiv:2405.04434. Leviathan, Yaniv; Kalman, Matan; Matias, Yossi (2023-05-18), Fast Inference from Transformers via Speculative Decoding, arXiv:2211.17192 Fu, Yao (2023-12-13). "Towards 100x Speedup: Full Stack Transformer Inference Optimization". Chen, Charlie; Borgeaud, Sebastian; Irving, Geoffrey; Lespiau, Jean-Baptiste; Sifre, Laurent; Jumper, John (2023-02-02), Accelerating Large Language Model Decoding with Speculative Sampling, arXiv:2302.01318 Gloeckle, Fabian; Badr Youbi Idrissi; Rozière, Baptiste; Lopez-Paz, David; Synnaeve, Gabriel (2024). "Better & Faster Large Language Models via Multi-token Prediction". arXiv:2404.19737 [cs.CL]. DeepSeek-AI; et al. (2024). "DeepSeek-V3 Technical Report". arXiv:2412.19437 [cs.CL]. Kitaev, Nikita; Kaiser, Łukasz; Levskaya, Anselm (2020). "Reformer: The Efficient Transformer". arXiv:2001.04451 [cs.LG]. Liu, Ze; Lin, Yutong; Cao, Yue; Hu, Han; Wei, Yixuan; Zhang, Zheng; Lin, Stephen; Guo, Baining (2021). "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows". 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 9992–10002. arXiv:2103.14030. doi:10.1109/ICCV48922.2021.00986. ISBN 978-1-6654-2812-5. Ristea, Nicolaea Catalin; Ionescu, Radu Tudor; Khan, Fahad Shahbaz (2022-09-18). "SepTr: Separable Transformer for Audio Spectrogram Processing". Interspeech. ISCA: 4103–4107. arXiv:2203.09581. doi:10.21437/Interspeech.2022-249. Tay, Yi; Dehghani, Mostafa; Abnar, Samira; Shen, Yikang; Bahri, Dara; Pham, Philip; Rao, Jinfeng; Yang, Liu; Ruder, Sebastian; Metzler, Donald (2020-11-08). "Long Range Arena: A Benchmark for Efficient Transformers". arXiv:2011.04006 [cs.LG]. "Reformer: The Efficient Transformer". Google AI Blog. 16 January 2020. Archived from the original on 2020-10-22. Retrieved 2020-10-22. Gomez, Aidan N; Ren, Mengye; Urtasun, Raquel; Grosse, Roger B (2017). "The Reversible Residual Network: Backpropagation Without Storing Activations". Advances in Neural Information Processing Systems. 30. Curran Associates, Inc. arXiv:1707.04585. Child, Rewon; Gray, Scott; Radford, Alec; Sutskever, Ilya (2019-04-23), Generating Long Sequences with Sparse Transformers, arXiv:1904.10509 "Constructing Transformers For Longer Sequences with Sparse Attention Methods". Google AI Blog. 25 March 2021. Archived from the original on 2021-09-18. Retrieved 2021-05-28. Zhai, Shuangfei; Talbott, Walter; Srivastava, Nitish; Huang, Chen; Goh, Hanlin; Zhang, Ruixiang; Susskind, Josh (2021-09-21). "An Attention Free Transformer". arXiv:2105.14103 [cs.LG]. Peng, Hao; Pappas, Nikolaos; Yogatama, Dani; Schwartz, Roy; Smith, Noah A.; Kong, Lingpeng (2021-03-19). "Random Feature Attention". arXiv:2103.02143 [cs.CL]. Choromanski, Krzysztof; Likhosherstov, Valerii; Dohan, David; Song, Xingyou; Gane, Andreea; Sarlos, Tamas; Hawkins, Peter; Davis, Jared; Belanger, David; Colwell, Lucy; Weller, Adrian (2020-09-30). "Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers". arXiv:2006.03555 [cs.LG]. Lu, Kevin; Grover, Aditya; Abbeel, Pieter; Mordatch, Igor (2022-06-28). "Frozen Pretrained Transformers as Universal Computation Engines". Proceedings of the AAAI Conference on Artificial Intelligence. 36 (7): 7628–7636. doi:10.1609/aaai.v36i7.20729. ISSN 2374-3468. "Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality | LMSYS Org". lmsys.org. Retrieved 2024-08-11. Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae (2023-12-15). "Visual Instruction Tuning". Advances in Neural Information Processing Systems. 36: 34892–34916. Radford, Alec; Kim, Jong Wook; Xu, Tao; Brockman, Greg; McLeavey, Christine; Sutskever, Ilya (2022). "Robust Speech Recognition via Large-Scale Weak Supervision". arXiv:2212.04356 [eess.AS]. Jaegle, Andrew; Gimeno, Felix; Brock, Andrew; Zisserman, Andrew; Vinyals, Oriol; Carreira, Joao (2021-06-22). "Perceiver: General Perception with Iterative Attention". arXiv:2103.03206 [cs.CV]. Jaegle, Andrew; Borgeaud, Sebastian; Alayrac, Jean-Baptiste; Doersch, Carl; Ionescu, Catalin; Ding, David; Koppula, Skanda; Zoran, Daniel; Brock, Andrew; Shelhamer, Evan; Hénaff, Olivier (2021-08-02). "Perceiver IO: A General Architecture for Structured Inputs & Outputs". arXiv:2107.14795 [cs.LG]. "Parti: Pathways Autoregressive Text-to-Image Model". sites.research.google. Retrieved 2024-08-09. Villegas, Ruben; Babaeizadeh, Mohammad; Kindermans, Pieter-Jan; Moraldo, Hernan; Zhang, Han; Saffar, Mohammad Taghi; Castro, Santiago; Kunze, Julius; Erhan, Dumitru (2022-09-29). "Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions". {{cite journal}}: Cite journal requires |journal= (help) Chang, Huiwen; Zhang, Han; Barber, Jarred; Maschinot, A. J.; Lezama, Jose; Jiang, Lu; Yang, Ming-Hsuan; Murphy, Kevin; Freeman, William T. (2023-01-02). "Muse: Text-To-Image Generation via Masked Generative Transformers". arXiv:2301.00704 [cs.CV]. Ramesh, Aditya; Pavlov, Mikhail; Goh, Gabriel; Gray, Scott; Voss, Chelsea; Radford, Alec; Chen, Mark; Sutskever, Ilya (2021-02-26), Zero-Shot Text-to-Image Generation, arXiv:2102.12092 Yu, Jiahui; Xu, Yuanzhong; Koh, Jing Yu; Luong, Thang; Baid, Gunjan; Wang, Zirui; Vasudevan, Vijay; Ku, Alexander; Yang, Yinfei (2022-06-21), Scaling Autoregressive Models for Content-Rich Text-to-Image Generation, arXiv:2206.10789 Kariampuzha, William; Alyea, Gioconda; Qu, Sue; Sanjak, Jaleal; Mathé, Ewy; Sid, Eric; Chatelaine, Haley; Yadaw, Arjun; Xu, Yanji; Zhu, Qian (2023). "Precision information extraction for rare disease epidemiology at scale". Journal of Translational Medicine. 21 (1): 157. doi:10.1186/s12967-023-04011-y. PMC 9972634. PMID 36855134. Further reading Alexander Rush, The Annotated transformer Archived 2021-09-22 at the Wayback Machine, Harvard NLP group, 3 April 2018 Phuong, Mary; Hutter, Marcus (2022). "Formal Algorithms for Transformers". arXiv:2207.09238 [cs.LG]. Ferrando, Javier; Sarti, Gabriele; Bisazza, Arianna; Costa-jussà, Marta R. (2024-05-01). "A Primer on the Inner Workings of Transformer-based Language Models". arXiv:2405.00208 [cs.CL]. Leech, Gavin (2024-11-06). "Transformer++". argmin gravitas. Archived from the original on 2025-02-26. Retrieved 2025-05-08. vte Google AI vte Artificial intelligence (AI) Categories: Google softwareNeural network architectures2017 in artificial intelligence This page was last edited on 26 July 2025, at 01:38 (UTC). Text is available under the Creative Commons Attribution-ShareAlike 4.0 License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Code of Conduct Developers Statistics Cookie statement Mobile view Wikimedia Foundation Powered by MediaWiki