data/: Data filesquestion/: Contains questions for testing PromCopilot (230 in total) and history questions (50 in total).result/: All result files generated by testing PromCopilot.details/: Includes 230 subdirectories, each corresponding to a question in data/question/test_question.csv, used to store all details of the execution results.*_rq1.json: Summary results for rq1.*_rq2.json: Summary results for rq2.*_rq3_*.json: Summary results for rq3.
import/: Data for bulk insertion of nodes and relationships in neo4j.middle/: Files generated during the execution process.
build/: Files required for deploying neo4j and elasticsearch.constant/: Configuration items.db/: Contains code for interacting with elasticsearch and neo4j.nl2promql/: Contains core code for converting natural language to promql.ablation.py: Code for conducting ablation experiments.build_es.py: For batch insertion of data into elasticsearch.build_kg.py: For batch insertion of data into neo4j.query.py: Entry file for executing nl2promql.summary.py: File for summarizing results (based on execution results to generate *_rq1.json, *_rq2.json, and rq3.json).requirements.txt: Environment file.baseline: Baseline method code and experimental results.
The following steps are performed on a host running the Ubuntu 20.10 (GNU/Linux 5.8.0-48-generic x86_64) operating system, with the internal IP address of 10.176.122.153.
Neo4j requires Java. Ensure the correct version of Java is installed and configured. Check the Java version with:
java -versionYou should see output similar to the following:
openjdk version "17.0.12" 2024-07-16
OpenJDK Runtime Environment (build 17.0.12+7-Ubuntu-1ubuntu220.04)
OpenJDK 64-Bit Server VM (build 17.0.12+7-Ubuntu-1ubuntu220.04, mixed mode, sharing)
Install neo4j:
curl -fsSL https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo "deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable 5" | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt update
sudo apt-get install neo4j=1:5.14.0Check neo4j version:
neo4j --versionYou should see output:
5.14.0
Download neo4j plugins:
wget -P ./build/neo4j https://github.com/neo4j/apoc/releases/download/5.14.0/apoc-5.14.0-core.jar
wget -P ./build/neo4j https://github.com/neo4j/graph-data-science/releases/download/2.5.5/neo4j-graph-data-science-2.5.5.jarCopy neo4j.conf and plugins to neo4j directories:
cp ./build/neo4j/neo4j.conf /etc/neo4j/
cp ./build/neo4j/apoc-5.14.0-core.jar /var/lib/neo4j/plugins/
cp ./build/neo4j/neo4j-graph-data-science-2.5.5.jar /var/lib/neo4j/plugins/Start neo4j:
sudo systemctl start neo4j.serviceCheck the running status of Neo4j:
sudo systemctl status neo4j.serviceYou should see output similar to the following:
● neo4j.service - Neo4j Graph Database
Loaded: loaded (/lib/systemd/system/neo4j.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2024-08-02 01:22:51 CST; 1min 14s ago
......
Visit the page: http://10.176.122.153:7474
The default username and password are both neo4j.
After logging in, you can set your own username and password. The username and password that will be used next are:
- username=neo4j
- password=promcopilot
mkdir /root/es
cp ./build/es/elasticsearch.yml /root/es/elasticsearch.yml
docker network create elastic
docker run -d --name es --net elastic -p 9200:9200 -p 9300:9300 -v /root/es/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml -m 3GB docker.elastic.co/elasticsearch/elasticsearch:8.11.0Check elasticsearch version:
curl -X GET "http://10.176.122.153:9200"{
"name" : "ed1c8e469c45",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "0i65JY3HRpS5C3mPFoUI9Q",
"version" : {
"number" : "8.11.0",
......
},
"tagline" : "You Know, for Search"
}modify configs in constant directory
./constant/es.py
# set your elasticsearch ip and port
ES_URL = 'http://10.176.122.153:9200'./constant/kg.py
# set your neo4j ip and port
NEO4J_URL = 'neo4j://10.176.122.153:7687'
# set your neo4j username and password
NEO4J_USER = 'neo4j'
NEO4J_PASSWORD = 'promcopilot'./constant/llm.py
# Load the OpenAI API key and base URL from environment variables.
# These need to be manually configured in the environment.
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_BASE_URL = os.getenv('OPENAI_BASE_URL')The Python version is python3.10. The related dependencies and their versions are listed in requirements.txt.
Copy the CSV files to the Neo4j import directory (/var/lib/neo4j/import/) for bulk insertion of entities and relationships.
sudo cp ./data/import/entity/*.csv /var/lib/neo4j/import/
sudo cp ./data/import/relationship/*.csv /var/lib/neo4j/import/Bulk insert data into Neo4j:
python3 build_kg.pyBulk insert data into elasticsearch:
python3 build_es.pyConvert the questions recorded in test_question.csv to PromQL and specify the LLM as gpt-3.5-turbo-0125.
python3 query.py -f ./data/question/test_question.csv -m gpt-3.5-turbo-0125The entire process of generating PromQL will be logged in the directory ./log/test/{question_id}/{model}/
$ tree log/test
log/test
├── 001
│ └── gpt-3.5-turbo-0125
│ ├── gpt-3.5-turbo-0125_lvp.json
│ ├── gpt-3.5-turbo-0125_md.json
│ ├── gpt-3.5-turbo-0125_metric.json
│ ├── gpt-3.5-turbo-0125_path.json
│ ├── gpt-3.5-turbo-0125_promql_prompt.json
│ └── time.json
├── 002
......
Note: The details of our test results for 230 cases are stored in the directory ./data/result/details/.
python3 ablation.py -b ./log/test -m gpt-3.5-turbo-0125The results of the ablation experiment are stored in the directory ./log/test/{question_id}/{model}/ablation/.
Use summary.py to obtain results for rq1, rq2, and rq3 (note that the correctness of the PromQL generated by the LLM needs to be manually labeled).
python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 1
python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 2 -a ./data/question/test_qa.csv
python3 summary.py -b ./log/test -m gpt-3.5-turbo-0125 -q 3The summary results will be stored in the ./log/test directory.
$ tree log/test
log/test
├── 001
│ └── gpt-3.5-turbo-0125
│ ├── ablation
│ │ ├── gpt-3.5-turbo-0125_no_metrics.json
│ │ └── gpt-3.5-turbo-0125_no_triples.json
│ ├── gpt-3.5-turbo-0125_lvp.json
│ ├── gpt-3.5-turbo-0125_md.json
│ ├── gpt-3.5-turbo-0125_metric.json
│ ├── gpt-3.5-turbo-0125_path.json
│ ├── gpt-3.5-turbo-0125_promql_prompt.json
│ └── time.json
......
├── gpt-3.5-turbo-0125_result_rq1.json
├── gpt-3.5-turbo-0125_result_rq2.json
├── gpt-3.5-turbo-0125_result_rq3_no_metrics.json
└── gpt-3.5-turbo-0125_result_rq3_no_triples.json
Note: The experimental results for the 230 cases used in our experiment, along with the labeled results, are stored in the ./data/result/ directory.
The download link for the dataset is: https://drive.proton.me/urls/K2BV4TF300#PJvx62SZB5Xe
After the PromCopilotDataSet.tar.gz file is downloaded, extract it to /root/PromCopilotDataSet.
mkdir /root/data
tar -xzvf ./PromCopilotDataSet.tar.gz -C /root/data/The structure of the PromCopilotDataSet directory is as follows:
data_2024_05_18_2024_05_25/: snapshot data from kubernetes, tempo and prometheus.k8s/: Data from kubernetes API, contains various kubernetes resources.tempo/: Data from Grafana Tempo, used for querying traces.prometheus/: Data from Prometheus, used for querying metrics.
question/: questions used for generating PromQL.crawled_samples/: Stack Overflow and tutorial examples referenced for generating questions.history.csv: 50 cases used for constructing the historical vector database in the baseline method.question.csv: 230 cases used for testing.qa.csv: Detailed annotations for the 230 cases, including metrics and knowledge triples.
run prometheus
cd /root/
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.45.0.linux-amd64.tar.gz
cd /root/prometheus-2.45.0.linux-amd64/
./prometheus --storage.tsdb.path="/root/data/PromCopilotDataSet/data_2024_05_18_2024_05_25/prometheus/20240525T130914Z-32cc7324286dcb46" --web.enable-lifecycle --storage.tsdb.retention.time=365d --web.listen-address=":19090"After deployment is complete, access: http://10.176.122.153:19090/ .
Set Evaluation time to a moment between 2024-05-18 10:00:00 UTC and 2024-05-25 10:00:00 UTC.