[AI large model application development] A very simple introduction to AI knowledge graphs: a step-by-step guide to experience LangChain to create and query knowledge graphs (with code and source code analysis)

2024-07-12

Recently, it has become increasingly popular to use graph databases or knowledge graphs in large model applications. Graphs have a natural advantage in representing and storing diverse and interrelated information, and can easily capture complex relationships and properties between different data types, thereby better providing context or data support for large models. This article will take a look at how to use graph databases or knowledge graphs in large model applications.

This article is just a simple introduction and experience.It doesn’t matter if you don’t know graph database or neo4j, just follow the steps in this articleThis article can help you understand how to use knowledge graphs in RAG. With this understanding, you can learn how to use graph databases later if you need to.

0. What is a knowledge graph?

0.1 Concept

A knowledge graph is a structured semantic knowledge base that stores and represents entities (such as people, places, organizations, etc.) and relationships between entities (such as relationships between people and geographical locations) in the form of graphs. Knowledge graphs are often used to enhance the semantic understanding capabilities of search engines, providing richer information and more accurate search results.

The main features of the knowledge graph include:

1. Entity : The basic unit in the knowledge graph, representing an object or concept in the real world.

2. Relation: Relationships between entities, such as "belongs to", "is located in", "created by", etc.

3. Attribute: Descriptive information of an entity, such as a person’s age, the longitude and latitude of a location, etc.

4. Graph Structure: Knowledge graph organizes data in the form of a graph, which contains nodes (entities) and edges (relationships).

5. *Semantic Network: The knowledge graph can be regarded as a semantic network in which both nodes and edges have semantic meanings.

6. Inference：Knowledge graphs can be used for reasoning, that is, to derive new information from known entities and relationships.

Knowledge graphs are widely used in search engine optimization (SEO), recommendation systems, natural language processing (NLP), data mining, etc. For example, Google's Knowledge Graph, Wikidata, DBpedia, etc. are all well-known knowledge graph examples.

0.2 The significance of knowledge graph

As a form of data organization, the significance of knowledge graph lies in providing an efficient and intuitive way to represent and manage complex data relationships. It presents data in a structured form through the nodes and edges of the graph structure, enhances the semantic expression ability of the data, and makes the relationship between entities clear and explicit. Knowledge graph significantly improves the accuracy of information retrieval, especially in the field of natural language processing, it enables machines to better understand and respond to complex user queries. Knowledge graph plays a core role in intelligent applications, such as recommendation systems, intelligent question and answer, etc.

After the boring introduction, let’s take a look at the case of RAG+knowledge graph and implement it ourselves.

The following example comes from LangChain’s official documentation: https://python.langchain.com/v0.1/docs/integrations/graphs/neo4j_cypher/#refresh-graph-schema-information

1. Get started with coding

1.1 Preliminary Preparation

(1) First, you need to install a graph database. Here we use neo4j.

Python installation command:

pip install neo4j
1

(2) Register an account on the official website, log in, and then create a database instance. (If you are using it for learning, just choose the free one.)

After creating an online database instance, the page is as follows:

Now you can use this database in your code.

1.2 Code Practice

(1) After creating the database instance, you should get the link, username and password of the database. As usual, put them in the environment variables and then load the environment variables through Python:

neo4j_url = os.getenv('NEO4J_URI')  
neo4j_username = os.getenv('NEO4J_USERNAME')  
neo4j_password = os.getenv('NEO4J_PASSWORD')
1
2
3

(2) Link database

LangChain encapsulates the neo4j interface, and we only need to import the Neo4jGraph class to use it.

from langchain_community.graphs import Neo4jGraph  
graph = Neo4jGraph(url=neo4j_url, username=neo4j_username, password=neo4j_password)
1
2

(3) Query filling data

The query interface can be used to query and return results. The query language is Cypher query language.

result = graph.query(  
    """  
MERGE (m:Movie {name:"Top Gun", runtime: 120})  
WITH m  
UNWIND ["Tom Cruise", "Val Kilmer", "Anthony Edwards", "Meg Ryan"] AS actor  
MERGE (a:Actor {name:actor})  
MERGE (a)-[:ACTED_IN]->(m)  
"""  
)  
  
print(result)  
  
# 输出：[]
1
2
3
4
5
6
7
8
9
10
11
12
13

The output of the above code is []。

(4) Refresh the diagram’s architecture information

graph.refresh_schema()  
print(graph.schema)
1
2

From the results, we can see that the schema contains information such as the node type, attributes, and relationships between types, and is the architecture of the graph.

We can also log in to the neo4j web page to view the data stored in the graph database:

(5) Now that we have data in the graph database, we can query it.

LangChain encapsulates the GraphCypherQAChain class, which can be used to easily query the graph database. The following code:

chain = GraphCypherQAChain.from_llm(  
    ChatOpenAI(temperature=0), graph=graph, verbose=True  
)  
  
result = chain.invoke({"query": "Who played in Top Gun?"})  
print(result)
1
2
3
4
5
6

Implementation process and results:

First, the natural language (Who played in Top Gun?) is converted into a graph query statement through the big model, then the query statement is executed through neo4j, the results are returned, and finally converted into natural language through the big model and output to the user.

2. Expand your knowledge

2.1 GraphCypherQAChain Parameters

In the above code, we use LangChain's GraphCypherQAChain class, which is a graph database query and question-answering Chain provided by LangChain. It has many parameters that can be set, such as using exclude_types To set which node types or relationships to ignore:

chain = GraphCypherQAChain.from_llm(  
    graph=graph,  
    cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),  
    qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),  
    verbose=True,  
    exclude_types=["Movie"],  
)
1
2
3
4
5
6
7

The output is similar to the following:

Node properties are the following:  
Actor {name: STRING}  
Relationship properties are the following:  
  
The relationships are the following:
1
2
3
4
5

There are many similar parameters available, please refer to the official documentation: https://python.langchain.com/v0.1/docs/integrations/graphs/neo4j_cypher/#use-separate-llms-for-cypher-and-answer-generation

2.2 GraphCypherQAChain Execution Source Code

Below is the source code of GraphCypherQAChain. Let’s take a quick look at its execution process.

（1）cypher_generation_chain: Conversion of natural language to graph query statements.

（2）extract_cypher: Remove the query statement because the large model may return some additional description information, which needs to be removed.

（3）cypher_query_corrector: Correct the query statement.

（4）graph.query: Execute query statements, query the graph database, and obtain content

（5）self.qa_chain: Based on the content of the original question and query, the large model is used again to organize the answers and output them to the user in natural language.

def _call(    self,  
    inputs: Dict[str, Any],  
    run_manager: Optional[CallbackManagerForChainRun] = None,) -> Dict[str, Any]:  
    """Generate Cypher statement, use it to look up in db and answer question."""  
    ......  
  
    generated_cypher = self.cypher_generation_chain.run(  
        {"question": question, "schema": self.graph_schema}, callbacks=callbacks  
    )  
  
    # Extract Cypher code if it is wrapped in backticks  
    generated_cypher = extract_cypher(generated_cypher)  
  
    # Correct Cypher query if enabled  
    if self.cypher_query_corrector:  
        generated_cypher = self.cypher_query_corrector(generated_cypher)  
  
    ......  
  
    # Retrieve and limit the number of results  
    # Generated Cypher be null if query corrector identifies invalid schema  
    if generated_cypher:  
        context = self.graph.query(generated_cypher)[: self.top_k]  
    else:  
        context = []  
  
    if self.return_direct:  
        final_result = context  
    else:  
        ......  
  
        result = self.qa_chain(  
            {"question": question, "context": context},  
            callbacks=callbacks,  
        )  
        final_result = result[self.qa_chain.output_key]  
  
    chain_result: Dict[str, Any] = {self.output_key: final_result}  
    ......  
  
    return chain_result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

How to learn big AI models?

As a warm-hearted veteran of the Internet, I decided to share my valuable AI knowledge with everyone. How much you can learn depends on your perseverance and ability. I have shared important AI big model materials, including AI big model entry learning mind maps, high-quality AI big model learning books and manuals, video tutorials, practical learning and other recorded videos for free.

This complete version of the large model AI learning material has been uploaded to CSDN. If you need it, you can scan the CSDN official certification QR code below on WeChat to get it for free.保证100%免费】

1. Full set of AGI large model learning route

A learning journey in the era of AI big models: from basics to cutting-edge, master the core skills of artificial intelligence!

2. A collection of 640 AI large model reports

This collection of 640 reports covers many aspects of AI big models, including theoretical research, technical implementation, and industry applications. Whether you are a researcher, engineer, or a fan interested in AI big models, this collection of reports will provide you with valuable information and inspiration.

3. AI Big Model Classic PDF Book

With the rapid development of artificial intelligence technology, AI large models have become a hot topic in today's technology field. These large pre-trained models, such as GPT-3, BERT, XLNet, etc., are changing our understanding of artificial intelligence with their powerful language understanding and generation capabilities. The following PDF books are very good learning resources.

4. Commercialization of AI Big Models

As ordinary people, entering the era of big models requires continuous learning and practice, constantly improving one's skills and cognitive level. At the same time, one also needs to have a sense of responsibility and ethical awareness to contribute to the healthy development of artificial intelligence.

Technology Sharing