Skip to content

s-evsyukov/Spark_Poetry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Poetry

Skills and tools:

PySpark PyTest Poetry Wheel Quinn


Task: Create PySpark project with Poetry DMS:

  • Build project with Poetry (DMS) Dependency Management System
  • Add Quinn package to project with Poetry
  • Create and run tests to verify proper installation and work of PySpark and Quinn packages
  • Package PySpark project as wheel files

Work progress:

  1. Installing Poetry

  2. Creating project with Poetry dependency management for Python

poetry new Spark_Poetry
  1. Adding PySpark to project
poetry add pyspark
  1. Set idea python interpreter as poetry & set path:
/home/vt/.cache/pypoetry/virtualenvs/spark-standalone-RRWaD6iA-py3.10/bin/python3.10
  1. install Python Distutils for poetry dependencies update
sudo apt-get install python3.10-distutils
  1. Creating Spark session
  2. Creating Dataframe file to transform
  3. Creating PyTest for transformation file
  4. Executing first test:
PASSED  [100%]

+----+---+
|name|age|
+----+---+
|jose|  1|
|  li|  2|
+----+---+

+----+---+---------+
|name|age|Greetings|
+----+---+---------+
|jose|  1|   hello!|
|  li|  2|   hello!|
+----+---+---------+
  1. Adding Quinn dependency to project:
poetry add quinn
  1. Creating second test with new DataFrame that contain non-word characters. With use of quinn.remove_non_word_characters() function we will remove non-word characters.

  2. Executing second test:

PASSED [100%]

+----------+------+
|first_name|letter|
+----------+------+
|    jo&&se|     a|
|      ##li|     b|
|   !!sam**|     c|
+----------+------+

+----------+------+----------------+
|first_name|letter|Clean_first_name|
+----------+------+----------------+
|    jo&&se|     a|            jose|
|      ##li|     b|              li|
|   !!sam**|     c|             sam|
+----------+------+----------------+
  1. Specify package name in pyproject.toml
packages = [
    { include = "Spark_Poetry" }
]
  1. Package wheel file
poetry build

>>> Building Spark_Poetry (0.1.0)
  - Building sdist
  - Built Spark_Poetry-0.1.0.tar.gz
  - Building wheel
  - Built Spark_Poetry-0.1.0-py3-none-any.whl

About

Spark and Poetry practice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages