-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for Data-Forge version 2 #108
Comments
Adding my voice towards better support for large datasets |
Some great ideas there! I'm not well versed enough in DataForge to give strong suggestions but I have written a couple pipe/stream libraries in the past and have some thoughts.
Example:
There's a lot to take away there but the main point I want to drive home is if it is possible to keep the core DataFrame and Series interface as simple as possible, and then use external libraries to do the manipulations.
The naming I choose was poor. Pipe seems better but DataForge has a few kinds of piping operations (such as summarizing, and grouping), that makes it complicated.
I'm pretty excited about that possibility. Imagine if DF 2 is comparable to pandas in terms of performance. There are more JS programmers than python programmers. And R is a true mess for readability. JS is better for visualization given its close connection to html. Plus, the transition for exploration to production might be easier in JS than in python and definitely easier compared to R, MatLab, etc.
Added an example for future discussion in a separate thread here. |
Some good ideas there @rat-matheson.
|
I've started a new discussion for a Data-Forge core library #111. Please give feedback there! |
Add support for Parquet |
Sounds cool. Is there a JavaScript library for it? |
Hey! Love this. I have some ideas for marketing. (I'm not very experienced programmer) I'm thinking about using this library and therefore using JS instead of Python for my analysis. I would have liked to see at first sight how often is the library updated and how many people use it. Preferrably somewhere here: http://www.data-forge-js.com/ If potential new users see that the community is growing and the library is evolving, it will make them more prone to try it and participate in the growth and maybe development or funding. I for example would love to send a small recurring donation (Patreon-style) if possible. This is something JavaScript was missing. Let me know what you think of these ideas. Cheers! EDIT: |
Hi @Spyrator thanks for your feedback and your sponsorship! I really appreciate it. |
Speaking of marketing and website, I want to give my 2 cents about the website. The hero banner has to go. It takes up half the screen real estate, giving the user a reason to bail before knowing how good the lib is. The image and coloring also makes it un-modern as well, giving it a very dated-library feel and image to the library. EDIT: Sorry for the slightly off-topic note, but since the website is not OSS, this repo seems like the only place to voice my thought on it. |
@fuzzthink is that something you can help with? I'd be happy for someone else to rebuild the landing page. I can do HTML and CSS, but my design skills leave a lot to be desired! |
@ashleydavis I'll be happy to help, but I'm not sure if I have time for it. But if you open source it, and state that you welcome UX changes, maybe others will jump in. Once you create a repo for it, I can take a look and we can chat in Issues there to possibly get the ball rolling with the banner change first. |
It's already open source here: https://github.com/ashleydavis/data-forge-landing-page But probably best to throw it away and start again! I'm happy to create a new repo in the DF org if you want to have at it. |
Hi Ashley, For this one, "Better support for statistics (e.g. linear regressions, correlation, etc)", your answer is " I'm already working through this in v1". What does it exactly mean? Have they been added in the V1? It seems that I couldn't find the corresponding APIs. |
Yes @e-tang, I was working on this at the end of 2021. There is some documentation but it's far from complete and I hope to improve it and continue the work in the future. If you feel like contributing please do! Here's the main page that links to other resources: Here's the API documentation: Here are the statistics functions I've added already: |
Thanks Ashley, It would be great if we can have correlation, regression, etc. Are these in the agenda? It will be really nice to have them. |
Yeah I would love to add those, I'm just not sure at the moment when I'll have time to come back to it. |
@bananensplit actually uses Day.js which is a smaller replacement of Moment.js. I don't think Data-Forge needs to change for this though. You can easily convert any series to a series of Moment object. If you find any problem with please submit an issue. |
I know this is old thread, but I think this feedback is still relevant for these:
Quick feedback
Design Feedback (Lazy eval + Ease of use)While I'm an imperative programmer, I think moving away from lazy evaluation (internally) would be a HUGE mistake. There is a whole universe between eager and lazy and many application require a mix (getters, iterators, signals, etc). From an engine's point of view; its easy to implement an eager API on top of a lazy one. It only took me ~20 lines to make However, the opposite direction; trying to implement a lazy API on top of an eager API, is a nightmare of complexity. I've done it for pandas. Ease-of-use, I believe is an issue of transparency, control, expectations, and bulk.
Before considering changing the engine I'd strongly consider:
|
Thanks so much for your input @jeff-hykin. Getting detailed feedback like this is very motivating and I agree with practically all of what you have said. This thread is old, but if and when DF v2 goes ahead, more planning will have to be done and I'll call on you then to help. The hardest thing is the marketing! So if you do get ideas on that please continue this thread. Happy new year! |
This issue is to discuss plans for version 2.
These is just ideas for the moment. I haven't started on this and am not sure when I will.
Plans for v2:
toArray
).Better support for statistics (e.g. linear regressions, correlation, etc)I'm already working through this in v1.Investigate replacing iterators with generator functionsI've investigated this now and it doesn't seem possible.Add map, filters and reduce functions(this is done now), deprecate select and where functions (make it more JavaScript-like)Support streaming data (e.g. for processing massive CSV files)Ideally DF would be async first and be used to define pipelines for async streaming data, but does async go against the goal of making DF easier to use? Is there a way that I can make it so that async usage is friendly?Stretch goals:
The text was updated successfully, but these errors were encountered: