Technology Sharing

Application of large language model--AI engineering implementation

2024-07-11

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina


The rapid development of AI in recent years has indeed brought a great impact, but in fact, AI has not completely crossed the border and is still only "self-entertaining" in a small circle. However, compared with the past, it has been very different.
This article will focus on the current status of large models and talk about the relevant issues of engineering implementation. It is also a casual feeling and summary.

We will not go into too much depth about AI itself here, but focus more on upper-level applications.

Overview of Large Language Model

When we talk about large language models, we mean software that can “speak” in a way that resembles human language. These models are amazing – they are able to take in context and generate responses that are not only coherent but also feel like they are coming from a real human.
These language models work by analyzing large amounts of text data and learning patterns in language usage. They use these patterns to generate text that is nearly indistinguishable from what humans say or write.
If you’ve ever chatted with a virtual assistant or interacted with an AI customer service agent, you may have interacted with a large language model without even realizing it! These models have a wide range of applications, from chatbots to language translation to content creation.

What is a large language model?

  • definition:Large Language Model (LLM) is a pre-trained natural language processing (NLP) model, usually with billions or even hundreds of billions of parameters, that can understand and generate natural language text. The training data of a mature large language model is massive.
  • Function:Large language models can perform a variety of language tasks, such as text classification, sentiment analysis, machine translation, text summarization, question answering systems, etc.
  • technical foundation: Based on the Transformer architecture, using the Self-Attention mechanism to process sequence data
  • develop: From the early RNN, LSTM to the current BERT, GPT and other models, the number of parameters and performance are constantly improving

What is Machine Learning

  • definition: Machine learning is a branch of artificial intelligence that enables computer systems to learn from data and make decisions or predictions without being explicitly programmed
  • type: Including supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning
  • application:Widely used in image recognition, speech recognition, recommendation systems, predictive analysis and other fields
  • Key concepts: Feature selection, model training, overfitting and underfitting, model evaluation, etc.

What is Deep Learning

  • definition: Deep learning is a subset of machine learning that uses a neural network structure similar to the human brain to learn complex patterns of data through multiple (deep) layers of nonlinear transformations.
  • Core Components: Neural network layers, activation functions, loss functions, optimization algorithms.
  • Architecture: Including convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory network (LSTM) and transformer (Transformer), etc.
  • application:Revolutionary progress has been made in image and speech recognition, natural language processing, autonomous driving and other fields

Understanding Large Language Models

Why do I need to open a separate chapter to talk about "understanding" the big language model after the above overview of the big language model? Because this will help you better understand what the big language model is, understand its upper limit, and also help us better do the application layer
First of all, in general, machine learning is to find a special complex "function" that can convert our input into the desired output. For example, if we expect the input to be 1, the output is 5; the input to be 2, the output is 10, then the function may be y=2*x. Or if we input a picture of a cat, I hope it will output the word "cat", or if I input "hi" it will output "hello", etc.

This can actually be seen as a mathematical problem. Of course, it is actually much more complicated than the example above.

History

1. In the early days, people always wanted to make machines think like humans. At that time, people mainly promoted the "bird flying school" based on bionics. When people saw birds flying, they learned to flap their wings to fly, so they hoped to make machines think like humans. However, this effect was not very good. There was no "world knowledge" (world knowledge is the well-known and instinctive cognition in your brain that you don't need to think about), such as "water flows to lower places". This kind of world knowledge is massive, and it is difficult to solve the problem of multiple meanings of a word. In general, imitating the human brain nerves is too complicated, and it is difficult to achieve it with just code and functions.

2. Artificial Intelligence 2.0 Era: Data-driven realization of "statistical artificial intelligence". Why did various large models emerge like mushrooms after the emergence of GPT3? In fact, most companies have been researching AI for a long time, but in the early days everyone was crossing the river by feeling the stones. Although there were many plans and ideas, they did not dare to increase investment to the point of all-in, and they all conducted research within a limited range. The emergence of GPT3 allowed everyone to see that a certain method was feasible, which is to use massive amounts of data to calculate statistics, using quantitative changes to cause qualitative changes, so there were successful cases, and everyone realized that this method was feasible, so they began to increase investment and take this path.

3. Big data can make a leap forward in the level of machine intelligence; the greatest significance of using large amounts of data is to allow computers to accomplish things that only humans could accomplish in the past.

  • Core idea: Based on the statistical information in a large amount of data, "training parameters" to fit the results (essentially "statistical" rather than "bionic")
  • Main advantages: As the amount of data accumulates, the system will continue to improve and become better and better;
  • Core elements: "Big data", large-scale, multi-dimensional, comprehensive big data
  • "Rote memorization" based on massive, multi-dimensional, and comprehensive big data;
    Through statistical artificial intelligence, the "intelligence problem" has been transformed into a "data problem", making computing
    Machines can solve "uncertainty problems" by learning from big data

The essential

So the key to the problem becomes a probability problem. The current large models all calculate a probability from massive amounts of data to determine what the highest probability is for the next word or a certain paragraph in the middle, and then output it. In fact, the essence is not to generate new things, but to infer.

For example, if you ask him what is the capital of China, the algorithm will extract the keyword "what is the capital of China"
Then the big model calculates from the massive amount of data that the capital of China is the word followed by Beijing, which has the highest probability of appearing, so it will output the correct result.

The big model relies on "memorizing" massive amounts of data to achieve its current capabilities
Therefore, the data quality for training large models is also very critical. At the same time, we can also roughly associate the upper limit of large models.

AIGC system

AIGC, or Artificial Intelligence Generated Content, is a technology that uses machine learning algorithms to automatically generate various types of content, including text, images, audio, and video. The AIGC system analyzes large amounts of data, learns language, visual, and audio patterns, and is able to create new content that is similar or even indistinguishable from human-generated content.
All digital work is likely to be subverted by the "big model"
Most of our current application layer work belongs to the AIGC system.
After GPT3.5, large models can already use tools
• Plugins and networking: make up for the lack of memory in the large model itself, marking the official start of LLM learning to use tools
• Function: LLM learns to call APIs to complete complex tasks, which is the main work of backend engineers (giving Gorilla instructions will automatically call models such as diffusion to achieve multimodal tasks such as drawing and dialogue)
• Let the model "think": guide the large model to have logical ability, the core is: "planning memory tool"

Implementation of AI engineering projects

In fact, the implementation of AI projects is the same as that of ordinary projects. The core of the project is to understand what core problem the project is going to solve, and then expand the thinking, and then analyze the needs, select technology, etc. We are not good at designing large models for the application layer. We usually call APIs directly or deploy large local open source models.

Landing method

Prompt Project (Phase 1)

Anyone who has had some contact with AI may know prompts. In 2022-2023, the initial research on AI is still based on this, that is, how to ask questions that can make AI understand your meaning better, pay attention to your key points and give higher quality answers.
The threshold is relatively low, and most large model applications are designed in prompt. It can meet some needs, depending on the capabilities of the basic model.

RAG Search (Phase 2)

RAG (Retrieval-Augmented Generation) is an AI technology that combines retrieval models and generative models. It enhances the answering ability of large language models (LLMs) by retrieving relevant information from a knowledge base or database and combining it with user queries. RAG technology can improve the accuracy and relevance of AI applications, especially in scenarios that deal with specific domain knowledge or require the latest information.
The working principle of RAG mainly includes two steps:

  1. Retrieval: Based on the user's query, RAG uses a retrieval model to search and extract the most relevant information or documents in the knowledge base.
  2. Generation: The retrieved information is used as input to the generation model, along with the user query, and the generation model produces answers or content based on it.
    The advantages of RAG technology are:
    • Knowledge updating: Ability to access the latest information, not just the knowledge at the time of model training
    • Reduce illusions: Reduce the tendency of LLM to generate inaccurate or false information through the assistance of external knowledge sources
    • Data security: Allows enterprises to use private data without uploading data to third-party platforms
    • Cost-effectiveness: RAG provides a more economical solution than retraining or fine-tuning large models
Training a specific function model (Phase 3)

However, this threshold is relatively high and has certain requirements for computing power, data, and algorithms.

Landed business design

Step 1: Ideation and exploration

Goal: Conduct feasibility validation, design prototypes based on business requirements, and build PromptFlow to test key assumptions

  • Core input: Clear business objectives
  • Key output: Verify whether the large language model (LLM) can meet the task requirements and confirm or deny key assumptions
  • Key Action Plans:
    • Clearly define business use cases
    • Select a suitable basic model and prepare necessary data for subsequent fine-tuning (SFT) or other purposes
    • Design and build PromptFlow to generate and test feasible hypotheses
Step 2: Build and enhance

Objective: Evaluate the robustness of the solution on a wider range of datasets and enhance model performance through techniques such as fine-tuning (SFT) and retrieval-augmented generation (RAG)

  • Core input: Business objectives combined with preliminary plan (output from step 1)
  • Key Output: A mature business solution, ready to be deployed to production systems
  • Key Action Plans:
    • Verify the effectiveness of PromptFlow on sample data
    • Evaluate and optimize PromptFlow, explore better prompts and tools
    • If the expected goal is achieved, expand to a larger data set for testing, and further improve the effect through SFT, RAG and other technologies
Step 3: Continued Operation

Objective: Ensure the stable operation of the AIGC system, integrate monitoring and alarm systems, and achieve continuous integration and continuous deployment (CI/CD)

  • Core input: an AIGC system that can solve a specific problem
  • Key Output: Production-grade applications with integrated monitoring, alerting systems, and CI/CD pipelines.
  • Key Action Plans:
    • Deployment of AIGC system
    • Integrated monitoring and alerting capabilities ensure system capabilities are embedded in applications
    • Establish application operation mechanisms, including continuous iteration, deployment and updates
      Through this process, we ensure that every step from proof of concept to production deployment is precise, controllable, and business-oriented.

Prompt Technology

1. The driving role of the main content fragment

The main content snippet is the text base that works in conjunction with the directive and can significantly increase its effectiveness.

  1. Definition of main content:
    • Main content is the core text that the model processes or transforms, usually paired with instructions to achieve a specific goal.
  2. Application examples:
    • Example 1: Provide a paragraph of Wikipedia text [text] with the instruction "Please summarize the above content".
    • Example 2: Given a table containing beer information [text], the instruction is "list all beers in the table with an alcohol content less than 6 degrees."

2. Implementation strategies for main contents

Specific methods to achieve the main content include:

  • Example: By providing examples of completing a task rather than direct instructions, the model can be asked to infer the actions that need to be performed.
  • Cue: Use instructions with clues to guide the model to reason step by step to arrive at the answer.
  • Templates: Provide reusable prompt recipes with placeholders, allowing customization to specific use cases.

3. The power of examples

By showing the model how to generate outputs given instructions, the model is able to infer the output pattern, whether it is zero-shot, one-shot, or few-shot learning.

  • component:
    • Overall mission description.
    • An example of a series of desired outputs.
    • A guide to new examples, serving as a starting point for subsequent tasks.

4. The Guiding Role of Clues (Cue)

By providing clues to the large model, it guides it to perform logical reasoning in a clear direction, similar to providing a step-by-step formula to help the model gradually obtain the answer.

5. Customization value of templates (Template)

The value of templates lies in creating and publishing libraries of prompts for specific application areas, where these prompt templates have been optimized for the specific context or example of the application.

  • Optimization tip: Make responses more relevant and accurate to your target user group.
  • Resource reference: The OpenAI API sample page provides a wealth of template resources.
  • Model role assignment: Enhance the model's understanding of task relevance by specifying model identity roles (such as system, user, assistant, etc.).

Advanced prompt example

# 职位描述:数据分析助手
## 角色
我的主要目标是为用户提供专家级的数据分析建议。利用详尽的数据资源,告诉我您想要分析的股票(提供股票代码)。我将以专家的身份,为您的股票进行基础分析、技
术分析、市场情绪分析以及宏观经济分析。
## 技能
### 技能1:使用Yahoo Finance的'Ticker'搜索股票信息
### 技能2:使用'News'搜索目标公司的最新新闻
### 技能3:使用'Analytics'搜索目标公司的财务数据和分析
## 工作流程
询问用户需要分析哪些股票,并按顺序执行以下分析:
**第一部分:基本面分析:财务报告分析
*目标1:对目标公司的财务状况进行深入分析。
*步骤:
1. 确定分析对象: