How To Run Meta’s New AI System ‘Galactica’ in a Python Environment | by Maximilian Strauss | Dec, 2022

Controversial AI trial that was called off after just two days

photo by steve johnson Feather unsplash

On November 15, Galactica was formally presented to the world. It is a large language model from meta AI to science and is thought to be capable of automatically organized scientific knowledge.

In practice, it means the ability to ,Summarize academic literature, solve math problems, create wiki articles, write scientific code, annotate molecules and proteins, and much more”,

Meta AI was founded by deep learning veterans Yann LeCun, Mark Zuckerberg, and Rob Fergus. preprint paper Trained on nothing less than an exaggeration, to name a few: 48 million papers, textbooks, lecture notes, and more. Starting with a quote from Galileo Galilei when describing the dataset.

Significantly outperforming the GPT-3 with a score of 68.2% versus 49.0% on some tasks. Certainly, something that generated a lot of interest. For the release, the webpage allowed testing of the AI. However, when the first results were posted on social media, it was clear that something was wrong.

It became clear that many of the results that were produced did not make sense. some called it AI knowledge base that makes stuff, other described as an example dangerous ai looks like, And only two days after it went public, it was closed,

Now, in case you’ve been curious despite the public backslash, there’s still a way to get Galactica running on your system despite the public demo being shut down. This story will guide you.

We’ll replicate a demo prompt, try our own, and see the results and computation times on GPU and CPU.

The hardware required to train Galactica is extraordinary.

There are many models of different sizes ranging from Mini (125M parameter) to Huge (120b parameter).

Huge The model was trained on 128 (!) Nvidia A100 80GB graphic cards. While the estimate is supposed to run on a single device, each of these cards cost over $10,000.

So, running the full model isn’t going to be something you could ever do on a consumer-grade PC. However, we will be able to run some of the smaller models.

Let’s first see how the different smaller models should perform: we can estimate how they compare Huge model by comparing the validation loss (Figure 6 in the paper) for different models:

   Model     Parameters   Loss   
---------- ------------ -------
Huge 120B ~1.8
Large 30B ~1.81
Standard 6.7B ~2
Base 1.3B ~2.25
Mini 125M ~2.8

model comparison, Mini Model loss is more than 50% Huge pattern. While this gives us a quantitative estimate of the loss, it will be hard to see what this means for the quality of the results. Later we will test the same signals on different model sizes.

To download loads of models, you need enough disk space. For the standard model, you’ll need ~25GB of space, and the required space scales with the number of parameters.


Galactica runs on Python, so you’ll need a working Python environment.

The instructions presented here are for package management systems such as minconda either Anaconda, Galactica GitHub Provides a quick start guide for installing Galactica via pip pip install galai,

For me, it didn’t work and displayed CUDA problems, although I tested it on a non-CUDA system.

Fortunately, the models are also available in weights Hugging Face HubAnd they can be used out of the box in the Transformer library.

It worked flawlessly for me, and this is the method we’ll describe here. As a side note, while the model also runs on a CPU, GPU acceleration is highly recommended, but installation can be tricky. I tested it on a Windows system with a GeForce 2080 Ti, and on a Macintosh system with an Apple M1 Max.

First, make sure you have set up your GPU environment correctly. For nvidia GPU, you can follow any one of CUDA guide, eg, this one for Windows. You can quickly check if you have CUDA installed nvidia-smi in command line. You should see the drivers and installed CUDA Toolkit versions. For M1 Mac systems, you will need the Xcode command line tools. can find a good guide Here,

  • Start with creating a new environment in conda:conda create -n torch-gal python=3.8
  • and then activate it:conda activate toch-gal

Galactica uses PyTorch, so follow the installation instructions on the PyTorch website for your system configuration, For Windows and a CUDA graphics card, this means you need to be sure to select the correct CUDA toolkit. For M1 Mac system, you need to select Preview (Nightly)-version if you want GPU acceleration.

  • Example for CUDA 11.7:conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
  • Example for M1:conda install pytorch -c pytorch-nightly,
  • Install some utility packages: conda install numpy tokenizers
  • Next, we’ll install the Transformer library: pip install transformers accelerate
  • Additionally, if you want to use Jupyter Notebook for coding:conda install -c conda-forge jupyter jupyterlab
  • You can test whether the GPU is accessible by torch within Python with the following functions. Mac:torch.backends.mps.is_available() Kuda: torch.cuda.is_available()

Once everything is set up, we can repeat the signals galactica demo, We will use the following signal as an example:

  • The Transformer architecture [START_REF]

contains the reference keyword [START_REF] and is going to give us a reference to the famous attention is all you need paper.


we can level uppipeline API from Transformer to run Galactica on CPU. Replace model-string with the corresponding model (for example, facebook/galactica-6.7b Is standard model, and facebook/galactica-125m Will happen small model.) Note that on first execution, the library will download the load into the cache. This will probably be your main drive and not necessarily where the code is saved, so be sure to have enough disk space. As mentioned above, the models can be quite large.

from transformers import pipeline
model = pipeline("text-generation", model="facebook/galactica-6.7b")
input_text = "The Transformer architecture [START_REF]"

This will give us the following result, which is the expected reference to the Transformers paper.

[‘generated_text’: ‘The Transformer architecture [START_REF] Attention is all you need, Vaswani a sequence-to-‘]


To run the same signal on the GPU, we pass the CUDA device as an argument to model initialization, for example, pipeline(..., device=0), First to run on GPU. For now, there is no support for Apple’s MPS when using pipeline,

To overcome this, we can use a more low-level API that gives us more control over where the code will be executed: OPTFForCausalLM, Which also works for MPS. I found it works more stable and consumes less memory. It is as follows:

from transformers import AutoTokenizer, OPTForCausalLM

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto")
input_ids = tokenizer(input_text, return_tensors="pt")"cuda")

outputs = model.generate(input_ids)
r = tokenizer.decode(outputs[0])


transformer architecture [START_REF] Attention is all you need, Vaswani[END_REF] is a sequence-to-

For Mac you would specify MPS instead of cuda:"mps"), You can also use CPU with Just.input_ids,


It takes about a minute to run the prompt on a standard architecture:

          Device           Execution Time  
------------------------ ----------------
i9-9920X 55s
NVIDIA GeForce 2080 Ti 68s
Apple M1 MAX (CPU) 105s
Apple M1 MAX (GPU) 29s


Viewing the references seems to work fine. There is arguably little advantage here; One is probably much quicker to find relevant literature with traditional search engines. So how about some more challenging questions? Let’s try a basic scientific question: How big is the nucleus of the cell? Google returns 233.000.000 results, and the first answer tells us the approximate size. 6 µm and references to a Wikipedia article. This takes 0.6s.

suggests framing questions as Galactica Question: [xxx] Answer: Let’s see how the results look”Q: How big is the nucleus of a cell?


(Trained on Apple M1 w. GPU)

Model Time Question: How large is the nucleus of a cell? Answer:
---------- ------ -------------------------------------------------------
Mini 7.5 100,0
Base 10.6 10000
Standard 29.2 The nucleus of a cell is

Obviously, the results don’t matter much. Another test (w/o the question) with a modified prompt: How big is the nucleus of the cell?

The nucleus is where the genetic material is

Which isn’t even very meaningful.

As a comparison, I turned on the same question lex, which is an AI writing tool based on GPT-3. It takes approx. ~5s:

The size of a cell’s nucleus varies widely depending on the specific type of cell. Generally, most cell nuclei range from about 1 to 10 micrometers (μm) in size. Some nuclei may be as large as 100 µm.

Obviously this is a better result.


Galactica provides individual keywords that can be added to the text prompt to tailor the output. we already had [START_REF] for reference, Question to questions, but there are also many Moresuch as TLDR, To summarize the text.

With some limitations, Galactica can also be executed on consumer-grade computers.

We could reproduce a demo prompt, and while context lookup works fine for text generation, the results are really questionable.

It seems that an AI writing tool gives more reasonable results. We reportedly couldn’t test the best giant model – because it’s out of reach for consumer-grade hardware. I would argue that such hardware would potentially be uneconomical even for some research labs. While we haven’t tested the entirety of what’s possible with Galactica and what’s claimed in the paper, I think this glimpse gives some perspective on what to expect.

Fundamentally, it is debatable whether large language models are suitable for such tasks. One train of thought argues that large models may be too large, and we end up with stochastic parrot Instead of models that have learned something. The results presented here certainly make the case.

Leave a Reply