Skip to content

FudanSELab/PromCopilot

Repository files navigation

PromCopilot

Project Structure

  • data/: Data files
    • question/: Contains questions for testing PromCopilot (230 in total) and history questions (50 in total).
    • result/: All result files generated by testing PromCopilot.
      • details/: Includes 230 subdirectories, each corresponding to a question in data/question/test_question.csv, used to store all details of the execution results.
      • *_rq1.json: Summary results for rq1.
      • *_rq2.json: Summary results for rq2.
      • *_rq3_*.json: Summary results for rq3.
    • import/: Data for bulk insertion of nodes and relationships in neo4j.
    • middle/: Files generated during the execution process.
  • build/: Files required for deploying neo4j and elasticsearch.
  • constant/: Configuration items.
  • db/: Contains code for interacting with elasticsearch and neo4j.
  • nl2promql/: Contains core code for converting natural language to promql.
  • ablation.py: Code for conducting ablation experiments.
  • build_es.py: For batch insertion of data into elasticsearch.
  • build_kg.py: For batch insertion of data into neo4j.
  • query.py: Entry file for executing nl2promql.
  • summary.py: File for summarizing results (based on execution results to generate *_rq1.json, *_rq2.json, and rq3.json).
  • requirements.txt: Environment file.
  • baseline: Baseline method code and experimental results.

Environment Description

The following steps are performed on a host running the Ubuntu 20.10 (GNU/Linux 5.8.0-48-generic x86_64) operating system, with the internal IP address of 10.176.122.153.

How to Run

1. deploy

1.1. neo4j

Neo4j requires Java. Ensure the correct version of Java is installed and configured. Check the Java version with:

java -version

You should see output similar to the following:

openjdk version "17.0.12" 2024-07-16
OpenJDK Runtime Environment (build 17.0.12+7-Ubuntu-1ubuntu220.04)
OpenJDK 64-Bit Server VM (build 17.0.12+7-Ubuntu-1ubuntu220.04, mixed mode, sharing)

Install neo4j:

curl -fsSL https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo "deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable 5" | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt update
sudo apt-get install neo4j=1:5.14.0

Check neo4j version:

neo4j --version

You should see output:

5.14.0

Download neo4j plugins:

wget -P ./build/neo4j https://github.com/neo4j/apoc/releases/download/5.14.0/apoc-5.14.0-core.jar
wget -P ./build/neo4j https://github.com/neo4j/graph-data-science/releases/download/2.5.5/neo4j-graph-data-science-2.5.5.jar

Copy neo4j.conf and plugins to neo4j directories:

cp ./build/neo4j/neo4j.conf /etc/neo4j/
cp ./build/neo4j/apoc-5.14.0-core.jar /var/lib/neo4j/plugins/
cp ./build/neo4j/neo4j-graph-data-science-2.5.5.jar /var/lib/neo4j/plugins/

Start neo4j:

sudo systemctl start neo4j.service

Check the running status of Neo4j:

sudo systemctl status neo4j.service

You should see output similar to the following:

● neo4j.service - Neo4j Graph Database
     Loaded: loaded (/lib/systemd/system/neo4j.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-08-02 01:22:51 CST; 1min 14s ago
  ......

Visit the page: http://10.176.122.153:7474 The default username and password are both neo4j. After logging in, you can set your own username and password. The username and password that will be used next are:

  • username=neo4j
  • password=promcopilot

1.2. elasticsearch

mkdir /root/es
cp ./build/es/elasticsearch.yml /root/es/elasticsearch.yml
docker network create elastic
docker run -d --name es --net elastic -p 9200:9200 -p 9300:9300 -v /root/es/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -m 3GB docker.elastic.co/elasticsearch/elasticsearch:8.11.0

Check elasticsearch version:

curl -X GET "http://10.176.122.153:9200"
{
  "name" : "ed1c8e469c45",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "0i65JY3HRpS5C3mPFoUI9Q",
  "version" : {
    "number" : "8.11.0",
    ......
  },
  "tagline" : "You Know, for Search"
}

2. Environment

2.1. config

modify configs in constant directory

  1. ./constant/es.py
# set your elasticsearch ip and port
ES_URL = 'http://10.176.122.153:9200'
  1. ./constant/kg.py
# set your neo4j ip and port
NEO4J_URL = 'neo4j://10.176.122.153:7687'
# set your neo4j username and password
NEO4J_USER = 'neo4j'
NEO4J_PASSWORD = 'promcopilot'
  1. ./constant/llm.py
# Load the OpenAI API key and base URL from environment variables.
# These need to be manually configured in the environment.
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_BASE_URL = os.getenv('OPENAI_BASE_URL')

2.2. python3

The Python version is python3.10. The related dependencies and their versions are listed in requirements.txt.

3. Prepare Data

3.1. insert data to neo4j

Copy the CSV files to the Neo4j import directory (/var/lib/neo4j/import/) for bulk insertion of entities and relationships.

sudo cp ./data/import/entity/*.csv /var/lib/neo4j/import/
sudo cp ./data/import/relationship/*.csv /var/lib/neo4j/import/

Bulk insert data into Neo4j:

python3 build_kg.py

3.2. insert data to elasticsearch

Bulk insert data into elasticsearch:

python3 build_es.py

4. Run

4.1. PromCopilot Method

Convert the questions recorded in test_question.csv to PromQL and specify the LLM as gpt-3.5-turbo-0125.

python3 query.py -f ./data/question/test_question.csv -m gpt-3.5-turbo-0125

The entire process of generating PromQL will be logged in the directory ./log/test/{question_id}/{model}/

$ tree log/test    
log/test
├── 001
│   └── gpt-3.5-turbo-0125
│       ├── gpt-3.5-turbo-0125_lvp.json
│       ├── gpt-3.5-turbo-0125_md.json
│       ├── gpt-3.5-turbo-0125_metric.json
│       ├── gpt-3.5-turbo-0125_path.json
│       ├── gpt-3.5-turbo-0125_promql_prompt.json
│       └── time.json
├── 002
......

Note: The details of our test results for 230 cases are stored in the directory ./data/result/details/.

4.2. Ablation Study

python3 ablation.py -b ./log/test -m gpt-3.5-turbo-0125

The results of the ablation experiment are stored in the directory ./log/test/{question_id}/{model}/ablation/.

5. Eval

Use summary.py to obtain results for rq1, rq2, and rq3 (note that the correctness of the PromQL generated by the LLM needs to be manually labeled).

python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 1
python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 2 -a ./data/question/test_qa.csv
python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 3

The summary results will be stored in the ./log/test directory.

$ tree log/test
log/test
├── 001
│   └── gpt-3.5-turbo-0125
│       ├── ablation
│       │   ├── gpt-3.5-turbo-0125_no_metrics.json
│       │   └── gpt-3.5-turbo-0125_no_triples.json
│       ├── gpt-3.5-turbo-0125_lvp.json
│       ├── gpt-3.5-turbo-0125_md.json
│       ├── gpt-3.5-turbo-0125_metric.json
│       ├── gpt-3.5-turbo-0125_path.json
│       ├── gpt-3.5-turbo-0125_promql_prompt.json
│       └── time.json
......
├── gpt-3.5-turbo-0125_result_rq1.json
├── gpt-3.5-turbo-0125_result_rq2.json
├── gpt-3.5-turbo-0125_result_rq3_no_metrics.json
└── gpt-3.5-turbo-0125_result_rq3_no_triples.json

Note: The experimental results for the 230 cases used in our experiment, along with the labeled results, are stored in the ./data/result/ directory.

6. Execute promql

6.1. dataset

The download link for the dataset is: https://drive.proton.me/urls/K2BV4TF300#PJvx62SZB5Xe After the PromCopilotDataSet.tar.gz file is downloaded, extract it to /root/PromCopilotDataSet.

mkdir /root/data
tar -xzvf ./PromCopilotDataSet.tar.gz -C /root/data/

The structure of the PromCopilotDataSet directory is as follows:

  • data_2024_05_18_2024_05_25/: snapshot data from kubernetes, tempo and prometheus.
    • k8s/: Data from kubernetes API, contains various kubernetes resources.
    • tempo/: Data from Grafana Tempo, used for querying traces.
    • prometheus/: Data from Prometheus, used for querying metrics.
  • question/: questions used for generating PromQL.
    • crawled_samples/: Stack Overflow and tutorial examples referenced for generating questions.
    • history.csv: 50 cases used for constructing the historical vector database in the baseline method.
    • question.csv: 230 cases used for testing.
    • qa.csv: Detailed annotations for the 230 cases, including metrics and knowledge triples.

6.2. prometheus

run prometheus

cd /root/
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.45.0.linux-amd64.tar.gz
cd /root/prometheus-2.45.0.linux-amd64/
./prometheus --storage.tsdb.path="/root/data/PromCopilotDataSet/data_2024_05_18_2024_05_25/prometheus/20240525T130914Z-32cc7324286dcb46" --web.enable-lifecycle --storage.tsdb.retention.time=365d --web.listen-address=":19090"

After deployment is complete, access: http://10.176.122.153:19090/ . Set Evaluation time to a moment between 2024-05-18 10:00:00 UTC and 2024-05-25 10:00:00 UTC.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •