The biggest thing PyPy adds is JIT compilation. This is precisely what the project to add JIT to CPython is working on these days. It's still early days for the project, but by 3.15 there's a good chance we'll see some really great speedups in some cases.
It's worth noting that PyPy devs are in the loop, and their insights so far have been invaluable.
That approach sometimes does work, but usually very poorly and often not at all.
It can work very well when the higher-up is well informed and does have deep technical experience and understanding. Steve Jobs and Elon Musk are great, well-known examples of this. They've also provided great examples of the same approach mostly failing when applied outside of their areas of deep expertise and understanding.
I'm very skeptical of these results: The sample size is much too small!
They started with only 64 individuals. Of those only ONE was found to be especially attractive to mosquitoes, and only TWO especially unattractive. They then used the scents from these three individuals, and 5 others who did not participate in the initial testing, for the following phases of research.
Also, of the 5 additional people, two were found to be especially attractive, and one unattractive. This is extremely inconsistent with the results from the first phase.
Now, they did a ton of work and show very interesting results. But these problems make all of their results very questionable in my mind.
Yes, followup research could shine more light on this to validate their results (or not), but I wish they would make less grand declarations about the meaningfulness of their results. I bet this is going to make the news around the world... and if it later turns out to be inaccurate, it's only going to further reduce the public's trust in science.
I don't understand what exactly you are complaining about. Can you elaborate on which specific statistical claim you think they should not make?
This is not an analysis of "what fraction of people are attractive to mosquitoes", it is about how mosquito attraction differs between people. You can make meaningful statistical claims about this with just two subjects (and lots of measurements, which they did - see e.g. Fig 1G).
I also don't understand how you complain about the insufficient sample size and then go on to claim the difference between two cohorts is "extremely inconsistent".
> You can make meaningful statistical claims about this with just two subjects
With just two subjects, there is a very high chance that the results of your testing are overly specific to those two subjects, and do not hold for most of the population.
With only 8 subjects, as in this study, that is still very true.
For example, people like the one person the mosquitos were extremely attracted to could be one in 10, one in 100 or one in a million. In the latter case, the findings are much, much less meaningful.
That's true even for the findings about specific genes affecting attraction. What if that one person is an extreme outlier and the mechanism causing the results isn't relevant for 99.9999% of the human population?
> Can you elaborate on which specific statistical claim you think they should not make?
For example, the claim that "Highly attractive people have higher levels of carboxylic acids on their skin" does not seem to be well enough backed by evidence in this case. If they wrote, "the one/three attractive people we tested had higher levels of carboxylic acids on their skin", I'd have no complaints.
> For example, the claim that "Highly attractive people have higher levels of carboxylic acids on their skin" does not seem to be well enough backed by evidence in this case.
I might be missing something, but do you not see additional evidence for this claim in the discussion around figure S4?
In my reading, the authors do go out of their way to point out that carboxylic acid presence is only one of possibly many more factors. (See also "Limitations of the study"). Most of that nuance is lost in the "highlights" section though, I would agree.
(I'm not GP) My answer would be that it's the difference between "Why do mosquitos have differential attractions to humans" vs "Why do mosquitos like Subject 33 more than Subject 28." The untestable assumption is that what makes Subject 33 more attractive to mosquitos than Subject 28 is generally applicable to the population at large. I agree intuitively with GP that finding 1/64 to be highly attractive in one sample and 2/5 in another would be surprising if this were following some binomial distribution, but the methodology for determining attractiveness was different between those participants (live test vs exposing mosquitos to a nylon that had been worn by the participant).
If the professor from my statistics class is anything to go by, you just need a sample of more than 32 and you can assume it's a normal distribution /s
Python is almost entirely developed and maintained by volunteers. The increasing backlog of issues and PRs is recognized as a problem.
Thanks to a generous donation, the PSF has recently begun employing one "developer in residence", Łukasz Langa. He is specifically tasked with tackling this backlog, and has been doing a mighty good job so far.
Still, with over 1,000 open PRs and many times more open issues, we (the Python devs) could use more helping hands. For example, anyone can confirm that bugs reproduce, review PRs, or test fixes, and those are all meaningful help (when done thoughtfully and thoroughly.)
I, and most core devs, volunteer happily and ask for nothing in return. If you think the situation outlined in the parent post isn't great and begs improvement, you're welcome to help!
Don't get me wrong I'm a huge FOSS fan and proponent, but I think your post illustrates very clearly what is wrong with the state of OSS today. We are talking a language like python that probably underpins profits on the order of tens of billions a year and is largely maintained by volunteers in their free time and the foundation can hire one dev! Somehow companies managed to outsource all the work to "the public" while keeping the profits for themselves.
That's the world we live in. Many banks have custom forks of Python and various other tools but they don't care to release their code. Illegal? Sure, but they're banks, they're kind of exempt as companies are.
A while ago someone posted here a post of Cray/HPE complaining to a bunch of volunteers on the GCC Fortran project that F2018 support was incomplete. The GCC team fully acknowledges its incompleteness, and knows that it is in fact, incomplete. It is not recommended for production applications. There are about 10 other compilers, some by companies such as NVIDIA and Intel, which would work perfectly fine and which have full F2018 support. But instead of using any of these, or seeing as the one complaining is on the committee that created the F2018 specification, going out and fixing it yourself, they complained to like 6 people because their government project was falling behind because of their inability to even comprehend that some group of internet communists will not do free work for them.
Literally idiotic. You're a representative one of the world's largest companies when it comes to this kind of stuff and perhaps one of the most knowledgeable people on Fortran alive. Go fix it or stop using beta products by internet communists for government contracts.
I think the entire point of this post is that people do NOT feel welcome to help, and it is being suggested that their experiences are one reason there is such an unmanageable backlog.
I agree with everything you say, and alluded to much of it in the parent post. I myself also volunteer happily on a number of projects, and attempt to keep up with similar deluges on those projects.
For someone like myself who is not already a core Python maintainer, how would you recommend making improvements to things like the Python documentation if not by raising pull requests?
For a specific doc fix, a PR is indeed the way to go. If it's not a simple fix, creating an issue on the tracker may also be called for. Doing so does help and is appreciated, even if it sometimes takes a long time to be addressed by a core dev.
Otherwise, one could help in ways like I mentioned previously: reading existing issues, checking if they are still relevant and commenting accordingly, reviewing PRs and patches, etc.
> For a specific doc fix, a PR is indeed the way to go. If it's not a simple fix, creating an issue on the tracker may also be called for.
That is precisely what I did. As noted, there has been no indication (by activity on the issue or PR) that having filed these helped or is appreciated. Hence discouragement.
It is one of the hard lessons of open source maintainership that, without providing feedback to contributors, there is no demonstration of the project's appreciation. Therefore new contributors will (correctly!) conclude that the project does not appreciate these efforts.
I think the key point that you are making is that projects which need more help should place higher priorities on working through PR and issue backlogs as those are a primary way to taking people who are willing to help and converting them into more dedicated and integrated maintainers. Letting those backlogs languish because you don't have enough volunteers becomes a self a reinforcing cycle that becomes harder and harder to break out of.
I'd love if someone could solve the problem with package management in Python. Make Pip Great Again - faster and more stable than poetry. Something like a kickstarter campaign would work. I'd give you money.
At least for the Python project, there are more than enough inherent "roadblocks" for new contributors to overcome. Work is being actively done to remove some of these, or at least make overcoming them easier.
Python is well into the process of transitioning from bugs.python.org to GitHub issues. See PEP 581[1] for the rationale. Ezio Melotti is leading the project, and one may follow the progress on the psf/gh-migration GitHub repo[2] where the work is being managed (including specifically the Projects tab).
Hopefully, that should help make things smoother and more accessible for potential contributors.
I am very much in favor of making contributions easier, but doesn't Github seem short-sighted?
It seems to be veering in a direction of some kind of weird cloud IDE thing, which is fine in and of itself, but what happens if/when it is no longer suitable as a general dev platform?
Python has full and total control over bugs.python.org, whereas Github is a proprietary platform. Might it not be a mistake to cede control over the issue tracker? It seems like the main problem in the article is the CLA-signing process anyway, not BPO.
It’s fantastic that Python is moving to github. It’s so frustrating that projects like django, emacs, python, etc have insisted on using these antiquated development workflows and that some of the old timers even insist that they are better and that github is for “mindless kids who put PRs up thoughtlessly”.
Sounds like you should go and read PEP 581 before criticizing their decision!
I admit I only skimmed the PEP, and I think their rationale is mostly sound. It's a concession to practicality, given the surprisingly limited resources of the Python core development group.
I certainly don't think the problem with GH is that it's for "mindless kids". I hate mailing lists and I hate old-school bug trackers that don't support code markup or rich linking. In the short and medium term, I'm grateful that things will become a lot easier. The long term is what worries me.
But I'm also probably being a bit too cynical. If and when in 5-10 years they want to move off of GH, they will be able to do so. I'm not envisioning some kind of catastrophic "rug pull" from Microsoft where suddenly the Github API disappears and the issue tracker becomes locked-in.
Right, exactly. It's moving to issues and PR-based workflows that's the important thing. If GitHub becomes problematic in the future, they can switch to some other provider of similar functionality, and all the millions of programmers around the world who are familiar with issues and PR-based workflows can follow them.
I am personally someone who only knows PR-based workflows and is somewhat ignorant about other workflows. There are the mailing lists and old-school bug trackers which I am pretty sure I don't like. But I know that several (most/all?) big US tech companies use non-PR-based workflows via tools like Gerrit and Phabricator. I am not at all clear yet on the pros and cons of those workflows versus PRs.
Github isn't setup to stop your from moving your project to other hosting providers. You can export GitHub to other git hosting setups (like a self-hosted GitLab.)
Probablu not, but you could probably find or write a script to close all the github issues with a final comment linking to the duplicated issue on the new tracker.
CLAs are the reason for the harsh language. There's no reason to make contributors create and link a half-dozen accounts and read seven hundred pages of legalese to contribute to your project. It's a real barrier to contributions (see: the linked article), not to mention the time projects waste setting up these broken automated CLA systems, just to justify some lawyer getting a paycheck.
Most of these libraries focus on calculating distance metrics between pairs of strings, or finding the nearest match to a string among many candidates.
On the other hand, fuzzysearch is used for finding near matches to a string within much longer strings or texts. Common use cases are fuzzy searches in DNA sequences and long texts such as books and articles.
I'm looking for great high-school Math teachers for a project. If you wouldn't mind sparing a few moments, please get in touch with me via email (gmail with my username) so I can give you some more details.
It's worth noting that PyPy devs are in the loop, and their insights so far have been invaluable.