LLM deploy
RAG
Rewrite-Retrieve-Read This work introduces a new framework, Rewrite-Retrieve-Read1 instead of the previous retrieve-then-read for the retrieval-augmented LLMs from the perspective of the query rewriting. In this framework, a small language model is adopted as a trainable rewriter to cater to the downstream LLM. Figure 1. Overview of proposed pipeline. (a) standard retrieve-then-read method. (b) LLM as a query rewriter. (c) pipeline with a trainable writer. (Image source: (Query Rewriting for Retrieval-Augmented Large Language Models))...
LLM Inference
why LLM inference runs slowly excellent solutions cerebras Figure 1. The result of LLaMA3.1-70B inference speed with different solutions. (Image source: Artificial Analysis)
Align LLMs
After pretraining on vast datasets and supervised fine-tuning with diverse instruction sets, Large Language Models (LLMs) have achieved remarkable capabilities in text generation. However, LLMs can generate seemingly reasonable sequences—-free from grammatical errors and redundant words—-they may still generate content that lacks truthfulness or accuracy. Are there any methods to mitigate these shortcomings? Researchers at OpenAI have framed these issues as the challenge of LLM alignment. Currently, one of the most prominent approaches to address these challenges is Reinforcement Learning from Human Feedback (RLHF)....
Audio LLMs
audioLM Based on SoundStream1 and w2v-BERT2, audioLM3 proposes a framework which consist of three components: tokenizer model, decoder-only Transformer language model, detokenizer model. SoundStream, is neural audio codec with strong performance, which converts input waveforms at 16 kHZ into embeddings while w2v-BERT plays the role to compute semantic tokens. Figure 1. (Image source: AudioLM: a Language Modeling Approach to Audio Generation) Figure 2. The three stages of the hierarchical modeling of semantic and acoustic tokens in AudioLM: i) semantic modeling for long-term structural coherence, ii) coarse acoustic modeling conditioned on the semantic tokens and iii) fine acoustic modeling....
LLM Agent
open-source component memo: A memory module of AI Agent for memorizing personal preferences, previous interactions, and business stages.
Data for LLMs
Training an LLM needs a large amount of high qualitity data. Even though many giant teches open up their high performance LLMs (e.g., LLaMA, Mistral), high qualitity data still remain private. Chinese Dataset English Dataset RefinedWeb: 600 B toknes Dolma: open-sourced by allenai, contains 3T tokens and a toolkit with some key features: high performance, portability, built-in tagger, fast decuplication, extensibility and cloud support. fineweb: 15 trillion tokens of high quality web data....
Continual Pretraining
Large language models (LLMs) have already demonstrated significant achievements, many startups make a plan to train their own LLMs. However, training a LLM from scratch remains a big challenge, both in terms of machine costs and the difficulty of data collection. Under this background, continuous pretraining based on some open source LLMs is a considerable alternative. Determine your purpose of your continuous pretraining LLM. In common, standard LLMs may not excel in specific domains like financial, law, or trade....
Image Generation
GLIDE
Video Generation
Recently, numerous AGI applications catch the eyes of almost all the people on the internet. Here lists some advanced papers elucidate their key principles and technologies. DiT The authors explore a new class of diffusion models based on the transformer architecture, Diffusion Transformers (DITs)1. Before their work, using a U-Net backbone to generate the target image is prevalent instead of using a transformer architecture. The authors make some experiments with variants of standard transformer blocks that incorporate conditioning via adaptive layer norm, cross-attention and extra input tokens....
Problems you may encounter during distributed training
Large Language Models (LLMs) have show great promise in various artificial intelligence applications. It is becoming a trend to train a Large Language Model. Nevertheless even for many senior AI engineers, training these complex models remain a significant challenge. Here lists a series of issues you may encounter in the future. torch.distributed.barrier() stuck during training with multi gpus At first, you should try to set the environment variable ‘NCCL_P2P_DISABLE=1’. If it works out, the solution is probably to disable ACS of Pcie in BIOS....
Evolution of Multimodality
With the swift development of deep neural networks, a multitude of models handling diverse information modalities like text, speech, images, and videos have proliferated. Among AI researchers, it’s widely acknowledged that multimodality is the future of AI. Let’s explore the advancements in multimodality in recent years. Texts & Images CLIP CLIP (radford et al., 20211) thinks learning directly from raw text about images is promising alternative which leverage much a boarder source of supervision....
Positional Encoding in Transformer
With the advancement of large language models (LLMs), the significance of the context length they can handle is increasingly apparent. Let’s take a look at the evolution of positional encoding over the years to enhance the context processing capability of LLMs. Vanilla Positional Encoding Why does Transformer need positional encoding? Actually, Transformer contains no recurrence and no convolution. To help the model to ultilize the order of the sequence, Vanilla Transformer (vaswani et al....
Know about diffusion models
What Is Generative Models? A generative model can be seen as a way to model the conditional probability of the observed $X$ given a target $y$ (e.g., given a target ‘dog’, generate a picture of the dog). Once trained, we can easily sample a stance of $X$. While training a generative model is significantly more challenging than a discriminative model (e.g., it is more difficult to generate an image of a dog than to identify a dog in a picture), it offers the ability to create entirely new data....
Visual LLMs
techniques have improved on not only text data but also computer vision recently. Here we focus on Visual Language Model (VLM) based on transformers. In the begining, some researchers try to extend BERT to process visual data and make a success. For example, visual-BERT and ViL-BERT achive strong performances on many visual tasks by training on two different objectives: 1) masked modeling task that aims to predict the missing part of a given input; and 2) a match task that aims to predict if the text and the image content are matched....