PySpark PyTest Poetry Wheel Quinn
- Build project with Poetry (DMS) Dependency Management System
- Add Quinn package to project with Poetry
- Create and run tests to verify proper installation and work of PySpark and Quinn packages
- Package PySpark project as wheel files
-
Creating project with Poetry dependency management for Python
poetry new Spark_Poetry- Adding PySpark to project
poetry add pyspark- Set idea python interpreter as poetry & set path:
/home/vt/.cache/pypoetry/virtualenvs/spark-standalone-RRWaD6iA-py3.10/bin/python3.10- install Python Distutils for poetry dependencies update
sudo apt-get install python3.10-distutils
- Creating Spark session
- Creating Dataframe file to transform
- Creating PyTest for transformation file
- Executing first test:
PASSED [100%]
+----+---+
|name|age|
+----+---+
|jose| 1|
| li| 2|
+----+---+
+----+---+---------+
|name|age|Greetings|
+----+---+---------+
|jose| 1| hello!|
| li| 2| hello!|
+----+---+---------+- Adding Quinn dependency to project:
poetry add quinn-
Creating second test with new DataFrame that contain non-word characters. With use of quinn.remove_non_word_characters() function we will remove non-word characters.
-
Executing second test:
PASSED [100%]
+----------+------+
|first_name|letter|
+----------+------+
| jo&&se| a|
| ##li| b|
| !!sam**| c|
+----------+------+
+----------+------+----------------+
|first_name|letter|Clean_first_name|
+----------+------+----------------+
| jo&&se| a| jose|
| ##li| b| li|
| !!sam**| c| sam|
+----------+------+----------------+- Specify package name in pyproject.toml
packages = [
{ include = "Spark_Poetry" }
]- Package wheel file
poetry build
>>> Building Spark_Poetry (0.1.0)
- Building sdist
- Built Spark_Poetry-0.1.0.tar.gz
- Building wheel
- Built Spark_Poetry-0.1.0-py3-none-any.whl