Skip to content

awslabs/graphrag-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

GraphRAG Toolkit

The graphrag-toolkit is a Python toolkit for building GraphRAG applications. It provides a framework for automating the construction of a graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.

The toolkit uses low-level LlamaIndex components – data connectors, metadata extractors, and transforms – to implement much of the graph construction process. By default, the toolkit uses Amazon Neptune Analytics or Amazon Neptune Database for its graph store, and Neptune Analytics or Amazon OpenSearch Serverless for its vector store, but it also provides extensibility points for adding alternative graph stores and vector stores. The default backend for LLMs and embedding models is Amazon Bedrock; but, as with the stores, the toolkit can be configured for other LLM and embedding model backends using LlamaIndex abstractions.

If you're running on AWS, there's a quick start AWS CloudFormation template in the examples directory. Note that you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the toolkit (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.

Installation

The graphrag-toolkit requires python and pip to install. You can install the graphrag-toolkit using pip:

$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v1.1.2.zip

Supported Python versions

The graphrag-toolkit requires Python 3.10 or greater.

Example of use

Indexing

import os

from graphrag_toolkit import LexicalGraphIndex
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory

from llama_index.readers.web import SimpleWebPageReader

import nest_asyncio
nest_asyncio.apply()

def run_extract_and_build():

    graph_store = GraphStoreFactory.for_graph_store(
        'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
    )
    
    vector_store = VectorStoreFactory.for_vector_store(
        'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
    )

    graph_index = LexicalGraphIndex(
        graph_store, 
        vector_store
    )

    doc_urls = [
        'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
    ]

    docs = SimpleWebPageReader(
        html_to_text=True,
        metadata_fn=lambda url:{'url': url}
    ).load_data(doc_urls)

    graph_index.extract_and_build(docs, show_progress=True)

if __name__ == '__main__':
    run_extract_and_build()

Querying

from graphrag_toolkit import LexicalGraphQueryEngine
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory

import nest_asyncio
nest_asyncio.apply()

def run_query():

  graph_store = GraphStoreFactory.for_graph_store(
      'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
  )
  
  vector_store = VectorStoreFactory.for_vector_store(
      'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
  )
  
  query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
      graph_store, 
      vector_store
  )
  
  response = query_engine.query('''What are the differences between Neptune Database 
                                   and Neptune Analytics?''')
  
  print(response.response)
  
if __name__ == '__main__':
    run_query()

Documentation

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.