Skip to content
View AnnafCouto's full-sized avatar

Block or report AnnafCouto

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AnnafCouto/README.md

Hello, I am Anna! 🚀📊

Data Scientist & Analyst | MSc Analytical Chemistry | BSc Biotechnology

I am passionated about the science behind data acquisiton and analysis. I started out usinf python for bioinformatics during my BSc, moved on to metabolomics (study of metabolite data) with R throughout my internship (developed an R workflow for that kind of data, that turned out to be a full R package: PipMet!).

Then, I went to back to python and Excel (VBA/Macros) to workout clinical trials data as a Data Assistant.

After, a two-year MSc provided me with a lot of time to investigate huge amounts of (very complex) chemical data, using all tools I could put my hands on (python, R, MATLAB, SQL, viz tools ...).

Currently, I decided to move out from academia and bring all my academic strenghts to the business world. I aspire to apply the structured scientific thought to the on-going, real world data as a Data Scientist.

My focus is more than just running code. It is to understand business logic, structure the analysis, discuss data, results and algorithms performance, to deliver actional values, wheter it is in Finance, Reatail ou Tech.

  • 🔭 Current focus: Developing Power BI projects and advancing Python skills for Machine Learning.
  • 💼 Differential: Fast learning capability combined with critical, scientific thinking.
  • 🎯 Goal: To act as a Data Analyst or Scientist in dynamic, data-driven environments.

🛠️ Tech Stack & Tools

Data Analysis & Engineering

SQL Python Pandas ETL R

Visualization & Business Intelligence

Power BI Matplotlib Storytelling


🧪 Projects

🧬 Big Data Analysis: ChEMBL Database

Insight extraction from a complex database with 2.8 million records.
Even though it is a scientific source, the challenge was strongly data-driven: I used SQL (Window Functions, CTEs) to clean, format, and cross-reference multiple tables. The goal was to perform an exploratory analysis to identify market trends, bioactivity potentials, and quality compliance.

Check code and analysis »

📦 PipMet: R package for Metabolomics

Thesis Project: A comparative analysis of R libraries for processing high-dimensional data.
Developed a framework to evaluate data quality metrics, redundancy reduction, and annotation accuracy. This project evolved into a fully functional R package.

Check package and tutorials »

🎓 BSc Thesis: Metabolomics Pipelines Evaluation

Benchmarking of different pipelines for metabolomic data processing in R.
The repository contains the raw datasets, the processing scripts in R, and the resulting performance metrics.

Check code and analysis »


📈 My GitHub Journey

Popular repositories Loading

  1. AnnafCouto AnnafCouto Public

    Config files for my GitHub profile.

  2. PipMet PipMet Public

    Repository for PipMet package

    R

  3. TCC TCC Public

    Avaliação de diferentes ferramentas de processamento de dados de GC-MS para metabolômica

    R

  4. chembl-drug-discovery chembl-drug-discovery Public

    SQL analysis of the ChEMBL database (v. 36) to extract drug discovery trends, validate bioactivity rules (Lipinski's Rule), and identify high-potency compounds.

    Python