by oftfrfbf
Super Massive Data Drop Is Live!
After much anticipation and a little delay on release day, on Sep 8th at 2307 UTC Numerai’s Super Massive Data Drop went live.
Today, we are releasing a new version of the Numerai dataset that massively increases the amount of embedded information with 3x features and 5x training data, and unlocks a whole new dimension of research possibilities with 20x new targets. -Numerai
There are now 1050 features instead of 310, and a total of 679 training and validation eras with targets provided instead of 142.
According to the Super Massive Data Release: Deep Dive:
The eras are now weekly instead of monthly. This means that eras match the tournament more precisely, however they are now “overlapping”. This means that nearby eras are correlated with one another because their targets are generated from stock market performance from a shared, or “overlapping”, period of time.
The new “training” period covers the same time period as eras #1-132 in the old data, but is now weekly rather than monthly.
The new “test” period is the same as the previous “test” period.
The new “validation” period covers the same time period as eras #197-212 in the old data plus an additional time period, and is now weekly rather than monthly.
The final major change is that there are now many different targets in the dataset. The tournament target, which is the one you are scored on, is always called “target”. Currently “target” corresponds to “target_nomi_20”, but this may change in the future. However you will also find 20 more targets which are not scored on, but you may find useful for training. The 20 targets consist of 10 different types of targets constructed using 2 different time periods, 20 and 60 days. Additional targets may also be added in the future.
Be aware that some of the new targets have different binning distributions than what you see with Nomi, i.e. 7 bins rather than 5, with less rigid constraints on samples per bin. Training models to be good at multiple targets and/or ensembling models trained on different targets is a great way to improve generalization performance and increase the uniqueness of your model.
The new data can be accessed either through the “Download Data” button in the leaderboard sidebar or through s3 links returned by the dataset API using the filename argument; a list of valid filenames can be retrieved through the new list_datasets API query. The new training_data and validation_data files will be the same every week, while the tournament_data file will be updated with the latest live era. Parquet and CSV versions of these files will be available at the start of the round each Saturday; you may retrieve data for past rounds using the round argument of the dataset and list_datasets APIs.
The official example scripts for the Numerai Data Science Tournament
Super Massive Data Drop Initial Feedback and Fixes
by aventurine
Two of the biggest questions asked in the community forums and the rocket.chat regarding the new dataset were:
How do we compare our old model performance to our new model performance?
I’m having RAM issues. How do I fix?
MikeP addressed both these concerns in an announcement on 09/09/21
1. There's a new file accessible via api called old_data_new_val.parquet
using the utils in the new example scripts you can run download_data(napi, 'old_data_new_val.parquet', 'old_data_new_val.parquet', round=280)
. This will give you the old data, but over the exact same period as the new validation. You will then be able to run your existing models and submit the predictions to diagnostics to get a 1 to 1 comparison against models built on the new data.
2. I've placed new files called numerai_validation_data_int8.parquet
, numerai_training_data_int8.csv
, etc. These have features as integers 0 to 4, which result in DataFrames about 30% as large.
I've also added numerai_live_data.parquet
and numerai_live_data_int8.parquet
which only contain the live era each week.
The int8 files will be available for each round so you can make your pipelines expect those if you're having RAM issues.
I think we can all appreciate the speed of the response from the team at Numerai!
Data Drop Memes
With so much activity and anticipation around the dataset drop, the #general and #meme channels on rocket.chat were littered with fire memes. So much so it needs its own section in this months edition. This is for you slyfox!
by slyfox
by PeterS
by The_Lords_Prior
by ark
by jeremy.berros
Numerbay.ai Updates
09/05/2021: The following includes changes from both the crypto_payment branch and the master branch 1, which have been merged today. (On-platform features are disabled as they are not completed yet. The eventual roll-out will be a turnkey operation.)
Completed on-platform checkout experience and basic order / sales management
Completed a successful on-platform NMR test transaction using Numerai wallets (Good news: the transaction took only seconds to confirm): Etherscan 1
Added option to deactivate product without deletion, and option to set automatic expiration of product after a certain round. The round number on the product page now indicates the selling / pre-selling round (instead of the current active tournament round)
Added notifications in multiple places and rearranged profile pages to enhance UX
Thorough linting of backend code
Stricter input validation for both frontend and backend REST endpoints (API docs available here (Swagger) 1 and here (ReDoc)
Other issue fixes and minor improvements
Work plan is in post #2
Next week to focus on seller submission experience and API automation of submission and distributtion
09/12/21: The following includes changes from the crypto_payment branch only, which are not live.
Completed storage integration with GCS
Completed basic on-platform file distribution experience. Sellers can manage product artifacts for each round, and buyers with confirmed orders can download via dynamically generated temporary links
Started working on email notifications
Other issue fixes
Work plan is in post #2
Next week to focus on further improving user experience, comprehensive API unit tests and UI integration tests
09/19/21: The following includes changes from both the crypto_payment branch and the master branch, which have been merged today. (On-platform features are disabled until release. The eventual roll-out will be a turnkey operation.)
NumerBay on-platform sales is ready for beta, coming around next Tuesday
Added example notebook for seller file distribution automation
Email notification has been implemented but disabled until some service provider issues are resolved
Achieved automated unit test coverage of 75%, the rest were done manually due to interactions with external systems. Frontend integration test was done manually pending automation in future
Various other issue fixes and minor improvements
Work plan is in post #2
Next week to roll out the beta release, do a walkthrough stream, and fix any coming issue after the release
09/26/21: The following includes changes from the master branch
Rollout of NumerBay on-platform sales live for beta
Walkthrough demo on OHwA
Enabled email notification for: [Sellers: New Confirmed Sale ; Buyers: New Order, Order Confirmation, Order Expiration]
Attempted to fix round rollover timing issues
Fixed Signals model performance metrics
Various other issue fixes and minor improvements
Work plan is in post #2
Next week to focus on the final milestone for core features: the Stake Mode
There are 56 “classic” listings, 11 signals listings
OnlyFams Merch Store Coming soon….
CoE Updates
[Amendment] NumerBay Funding Rate Equalization] - The CoE has settled on a base rate of $50 an hour for project funding
[Governance Proposal] Abridged community review for small retroactive bounties]
[Closed] Proposal Idea Contest
[Milestone] NumerBay On-Platform Sales Beta
Does anyone think we are missing the boat on NFTs?[Discussion]
Rejected projects:
Sponsor OHwA and DSC
Salaried Facilitator, Chief of Staff Position
Signals Python example using open sattelite data
New- Not voted on projects:
Bounty for high quality data science posts
Proposal Idea Contest
Extreme Makeover /r/numerai Edition
Ethereum Grant for on-chain Numerai-related projects
Be sure to monitor the kanban board on github for New, Approved, Done and Rejected Projects
Transaction Report
-41 NMR Community Marketplace
-8.3333 NMR Newsletter
-34 NMR Community Marketplace
-10 NMR discretionary bounty to OF_S for his code contributions to help R users download data and submit predictions with the new super massive dataset
-35 NMR Community Marketplace
-41 NMR Community Marketplace
arbitrage’s OHwA and DSC Ending 09/30
At the end of Sep 2021, CoE member arbitrage conducted his final Office Hours Zoom Stream and “Daily Scores and Chill” Twitch stream after streaming for the Numerai community since 2019 into the covid pandemic of 2020. After election and acceptance into the CoE as one of the 7 elders, arbitrage submitted a proposal for the CoE to take over for the funding of his streams from Numerai. In his proposal, arbitrage stated:
There’s one problem with OHwA, and that is the inability to talk about certain topics given that Numerai is funding and hosting the videos on their official channels. DSC is not hosted “officially” but is funded by Numerai, so we are able to discuss topics of interest that we cannot otherwise talk about on OHwA.
He proposed that the Council sponsor his “Office Hours with Arbitrage” and “Daily Scores and Chill” to allow all topics open with an immediate continuation of streaming after approval.
As a voting member of the Council of Elders. he would abstain from voting on or executing transactions for this proposal. Approval would require four of six CoE members voting in favor.
After much discussion (Far too much to write about in the newsletter. See below links for full discussions) in the forum post and on the CoE rocket.chat channel, a community vote was cast with 24 No votes and 18 Yes votes.
The CoE went with the community vote and rejected the proposal.
Although the proposal was rejected, there was an outpouring of support for all the work he has done for the community on the rocket.chat
[PROPOSAL] Sponsor OHwA and DSC
Signals Stake Boost
$NMR News-Index Coop’s New “ETF” Index Token $DATA
On 09/23/21, Index Coop, a DeFi space company that creates and maintains crypto index products, released a new index token called the Data Economy Index. The Data Economy Index (DATA) is a digital asset index capturing the growth of on-chain data economies. The Data Economy Index tracks projects with significant economic activity providing data-based products or services.
The initial nine components of DATA are LINK (25%), RENFIL (25%), GRT (25%), BAT (11.6%), LPT (5.2%), OCEAN (2.5%), NMR (2.4%), OXT (2.1%) and REPv2 (1.3%)
The initial nine components of DATA represent approximately $17.9 billion in market value and have no overlap with the Index Cooperative’s two existing index products.
Thomas_Hepner, a past numerai participant and current holder of NMR, is the creator of Data Economy Index and a Methodologist for Index Coop.
You can get more information on Index Coop and the $DATA token below
Other News
On 09/10/21 2chanes states “the model limit has been increased to 50”
Jonathan states Numerai is looking to hire a web developer! https://angel.co/company/numerai/jobs/1257894-senior-web-developer
Forum post by nyuton: How to ensemble models
Forum post by mdo: Feature Selection with BorutaShap
Numerai Mentioned in the Bankless podcast #82 - Investing in the Future with | Cathie Wood, Chris Burniske, Yassine Elmandjra | September 13, 2021 https://overcast.fm/+YNeScOMKY/1:21:18
There is a possibility Numerai will look to transition to Discord from rocket.chat in the coming future
Office Hours: QE Dashboard Walkthrough
Signals Roundtable: hit the round running with 20D targets ft. LiamHz, jrAI and jrb
Numerai's Super Massive Data Release: Debrief with The Michaels on Office Hours w Arbitrage
Example Script Presentation
Disclaimer: This is not an official Numerai newsletter. It is sponsored by the Numerai CoE, a decentralized autonomous organization. Every effort is made to provide accurate and complete information but there is no claims, promises or guarantees about the accuracy of the contents.