Artificial intelligence is becoming an important tool in the field of chemistry, providing innovative ways to solve complex problems that old-school methods find difficult. An area of artificial intelligence that is showing increasing popularity in the chemical sector is machine learning. This technology relies on sophisticated algorithms and data analysis to predict outcomes and perform tasks without directly coding to perform them.
Yet the catch is that machine learning needs a large dataset to make reliable predictions. The chemical industry often works with limited data, which hinders the ability of these computational methods to effectively learn and improve. Researchers working with Berend Smith of the École Polytechnique Fédérale de Lausanne (EPFL) have devised a clever way to make use of complex language models, including GPT-3, that understand and emulate human text well after being trained on a huge body of literature. ChatGPT, a well-known AI program, is built on the GPT-3 architecture.
Their ground-breaking work, presented in the journal Nature Machine Intelligence, presents a new technique that significantly simplifies the use of AI to decipher chemical data. Despite initial skepticism, this approach does not simply make chemical demands on GPT-3. Lead researcher Kevin Jablonka notes, “Because GPT-3 has not been exposed to a wide range of chemistry texts, ChatGPT’s chemistry queries mostly return results comparable to a Wikipedia search.”
“The key,” he explains, “is to adapt GPT-3 with a compact set of question-answer pairs. We are building a specialized model that is tuned to provide accurate chemical knowledge.” This involves providing GPT-3 with a selected series of questions and answers. As Smith describes, “Take the study of high-entropy alloys, where it is very important to distinguish whether the alloy is single-phase or multiphase. We collect known facts and inform GPT-3 using Q= “Does a particular high-entropy alloy have a single phase?” A = “Yes or No”.
“By adapting GPT-3 to this information,” he says, “we get a special AI tool designed to answer that particular yes or no question.” Their artificial intelligence, even with a minimal set of questions and answers, was able to correctly solve more than 95% of various chemistry problems on tests, often better than the success rate of advanced machine learning counterparts. “The beauty here is the simplicity, similar to conducting a literature review, but effective for many chemistry challenges,” shares Smith.
What sets this research apart is its directness and speed. Traditional machine learning models take months to build and require deep technical expertise, while Jablonka’s method can be set up in minutes and requires no prior knowledge. The study represents a process as convenient as a literature review, but potentially transformative for many chemical studies. Phrasing queries such as “Is the yield of a certain [chemical] high with this methodology?” and obtaining accurate answers can significantly change the strategic planning and execution of chemical research.
In their paper, the authors suggest: “Along with literature searches, reference to a foundational model such as GPT-3 may soon become a standard initial step for using aggregated knowledge in these models.” Smith sums it up elegantly: “This innovation should change the way we approach chemistry.”