Skip to content

Some comments #14

@Enchufa2

Description

@Enchufa2

Elegance

I appreciate Python's elegance too, but it's also true that spaces-vs-tabs issues are a real pain.

Learning curve

I wouldn't call it a huge win for R. It's true that for data science, you have mostly everything you need in base R compared to Python. But how many people work with plain R?

And putting aside the tools that are specific to data science, i.e., talking about the language itself (which is the first thing you need to master to start learning data science), that's a win for Python, because I think it's far more intuitive and easy to learn. R has many many strange things that are unique to R, such as the ability to modify itself, NSE, etc. These are versatile features, but hard understand and master.

All in all, I would call it a tie.

Available libraries

I don't see many Python data science libraries backed by an academic publication, and that's a small win for R, in my opinion.

Machine learning

The big actors are pushing for Python here, that's the truth. R tries to follow, but it's still behind.

Statistical correctness

I reaffirm what I said before about academic publications. I think it's important to highlight this point.

Object orientation, metaprogramming

I think that these categories deserve separate comments. I also like very much R's metaprogramming capabilities (which are great, but make it harder to learn, as I argued before). But I don't think it's fair to defend R's seriousness treating functions as objects, and, at the same time, to defend the R's OOP mess over Python's seriousness in this regard.

Language unity

Whether RStudio people see data.table as a competitor, that I don't know. But I don't think so for some reasons. I don't think that dplyr and data.table are in the same league, or serve the same purpose, because dplyr does not provide a new data frame backend (that would be tibble, but tibbles are just data frames with attributes, so it's not competing either). dplyr's purpose is to define a standard data wrangling interface that is independent from the source: a data frame, a database... or even a data table, because there's even the dtplyr package, a data.table backend for dplyr, developed by Hadley himself.

I don't understand what's exactly The Tidyverse Curse. Is it the pipe? (Which was there before the tidyverse, BTW). Because you can use tidyverse functions without the pipe, and the look and feel would be very similar to the subset/transform/aggregate/reshape workflow you could do with base R. And you could use base R with the pipe too.

Linked data structures

Many times I need this, probably due to my CS background, and I would call it a big win for Python.

Packages

Package development and the CRAN infrastructure are a huge win for R. I was surprised that there's no mention to this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions