- This project examines E-commerce sales and individual purchase patterns in Europe, focusing on the availability of broadband technologies (both mobile and fixed broadband) by year. The process began with collaborative discussions on topic section, exploration and data gathering using a mirror board for brainstorming and organization. Relevant datasets were courced from the Erostat portal. Data cleaning involved addressing null values by removing affected columns, eliminating duplicated columns nd standardizing column names for consistency. Additionally, pivot tables were created and the data was visualized to highlight key insights.
- Clone the repository:
git clone https://github.com/YourUsername/repository_name.git- Install UV If you're a MacOS/Linux user type:
curl -LsSf https://astral.sh/uv/install.sh | shIf you're a Windows user open an Anaconda Powershell Prompt and type :
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"- Create an environment
uv venv- Activate the environment If you're a MacOS/Linux user type (if you're using a bash shell):
source ./venv/bin/activateIf you're a MacOS/Linux user type (if you're using a csh/tcsh shell):
source ./venv/bin/activate.cshIf you're a Windows user type:
.\venv\Scripts\activate- Install dependencies:
uv pip install -r requirements.txt- Problem Statements:
- Current public opinion treats Europe as a uniform entity (Europe vs China vs the US)
- Overview Statistics do not reflect divergences between the member states
- Apparent hierarchy of current level of digitalization turns to be wrong (Germany as the 1st economy but lagging in many metrics to the topic)
- Hypothesis:
- Countries present differences between eachothers and within themselves for different but related KPIs (Business earnings vs consumers purchasing habits)
- Subdivision in regions can reflect some homogeneity between countries
- More granularity in the analysis could throw surprising insights
- E-commerce sales of enterprises by size class of enterprise
- Internet purchases by individuals (2020 onwards)
- Braodband usage across europe
- Creating a common Dataframe structure to make diferent datasets compatible
- Homogenizaion of datasets: Three datasets containing data for three different topics required some homogenization
- Ambiguous labels & relative values:Relative values and ambiguous labels complicated an efficient data extraction process
- Broadband Data - Central Tendency Issue: Unexpected drop in values in broadband dataset generated by inefficient measure of central tendencies
- Missing Data for Critical Technologies: Massive gaps for critical values for a small subset of technologies in prolonged time periods made it impossible to visualize data
- Ensured uniform formats, units, and structures across datasets.
- Converted relative values into absolute numbers where possible.
- Standardized ambiguous labels using clear definitions or mapping techniques.
- Identified and clarified vague labels
- Clean dataframe from dataset
- Pivot dataframe to have years on columns to have a variable-year structure
- Reset index and remove the name of the column index for better visualization
- E-commerce sales in Europe were relatively low compared to other regions globally. Within Europe, Northern Europe recorded the highest volume of online sales. However, the growth in online purchasing was more noticeable in regions with historically lower percentages of individuals shopping online, largely driven by the effect of the COVID-19 pandemic. Despite challenges in data analysis the study highlights that Europe digitalization is not uniform across regions.
- Investigate broader digital adoption, including AI, IoT, cloud computing, and cybersecurity.
- Evaluate post-COVID behavioral shifts in digital consumption.
- Conduct country specific deep dives to understand local barriers to digitalization.
- Develop targeted SME support programs to enhance digital competitiveness.