Github datasets

Github datasets. - GitHub - google-research-datasets/con Request a dataset by opening a Dataset request GitHub issue. 1. Datasets are sorted by year of publication. Originally published at UCI Machine Learning Repository: Iris Data Set, this small dataset from 1936 is often used for testing out machine learning algorithms and visualizations (for example, Scatter Plot). For each, sample CSV files range from 100 to 2 millions records. datasets format, with missing data imputation and the encoding of category and ordinal features. Contribute to selva86/datasets development by creating an account on GitHub. This repository was created to ensure that the datasets used in tutorials remain available and are not dependent upon unreliable third parties. py must contain a line similar to: A collection of datasets composed of software vulnerability metadata scraped from CVE Details, software metrics and security alerts generated by Static Analysis Tools (SATs), and version control metadata retrieved from each project's repository. Citation. Explore the data, watching out for interesting relationships. br for Brazil, it for Italy). You switched accounts on another tab or window. We would like to emphasize that you should not draw conclusions about causation in the data. A public repo of datasets. Code for robust monocular depth estimation described in "Ranftl et. 6k forks Report repository Releases 2. You will find a copy of the GPL in the Rdatasets github repository. The dataset contains 11639 images selected from the Open Images dataset, providing high quality word (~1. Figure 1: SWIM-IR dataset generation process. Open Sidebar. The 3D bounding box GitHub is where people build software. Download and play with key datasets from Google Trends, curated by the Trends Data Team at Google team. 1034768g. Contribute to ajaykuma/Datasets_For_Work development by creating an account on GitHub. Twitch-full: See the Google Drive folder containing all Twitch files. Explore Popular Topics Like Government, Sports, Medicine, Fintech, Food, More. data-science data r healthcare rstats healthcare-datasets healthcare-application healthcare-analysis data-sets Updated Jul 9, 2024; R; Ertugrulmert / ECG-Time-Series Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Below are four reasons, with accompanying real-world examples. Web Server Logs. Finally, complexity can be assessed using other LLMs acting The dataset includes a list of all the stocks contained therein. Run a container and join an interactive session, you will be Measuring accuracy can be easy in the case of mathematical problems using a Python interpreter, or near-impossible with open-ended, subjective questions. There are various moderating variables that affect all data, many This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. No Blockchains. Topics Trending Collections Enterprise Enterprise platform. Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP) - niderhoff/nlp-datasets There are a number of reasons why a data set that seems like it should exist might not, and they are all tied to the quiet complications inherent in data collection. Datasets included in this repo serve for Awesome Public Datasets: GitHub. The dataset aims to facilitate analysis and exploration of agricultural trends, crop diversification, and regional A quick guide (especially) for trending instruction finetuning datasets - GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets cached datasets (Arrow files) will still be reloaded from the datasets cache, by default at ~/. Nodes are developers who have starred at least 10 repositories and edges are mutual follower relationships . Make sure that the Docker user has sufficent rights and has around 90GB of available space on its hard drive. MIT license Activity. removed deprecated Notifications You must be signed in to change notification settings This is a list of openly available electrophysiological data, including EEG, MEG, ECoG/iEEG, and LFP data. github. It includes detailed information on crop production, yield, acreage, and other relevant agricultural metrics at the state level. removed deprecated arguments like use_auth_token, fs or ignore_verifications; Remove beam by @albertvillanova in #6987. , Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022" - isl-org/MiDaS A collection of datasets of ML problem solving. 7 GB dataset helps predict which letter-name was spoken — a simple classification task. The list is separated into Free and Paid and broken into subsections based on loose categories. Follow the instructions in the new post for how to download the data. io/datasets. Stars. Dataset Download Link Code; FE4395: Download: xlel: Visible Image Guided Thermal Image Super-Resolution Dataset. GitHub is where people build software. You signed in with another tab or window. Those CSV files can be used for testing purpose. Which one would you pick? 🤗 Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. Remove deprecated code by @albertvillanova in #6996. 499 positive samples and 78 negative samples. . GeneratorBasedBuilder. Data Collection: Berant et al. Feel free to dig in. A quick guide (especially) for trending instruction finetuning datasets - GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets Github Pages for CORGIS Datasets Project. A curated list of datasets, publically available for machine learning research in the area of manufacturing - nicolasj92/industrial-ml-datasets. These two datasets are blended into a single product to produce the combined global land and ocean temperature In this repository, we present information on datasets that have been used for hate speech detection or related concepts such as cyberbullying, abusive language, online harassment, among others, to make it easier for researchers to obtain datasets. The scripts are kept on the scripts/import directory, followed by the international place code (e. This repo contains data sets that are required in order to perform the applications and exercises - GitHub - kirenz/datasets: This repo contains data sets that are required in order to perform the applications and exercises Data is posted to social media every Monday morning. plotly. The authors of this repository do not own any licenses for the datasets, the goal of the project is to provide a stanardized collection of datasets for research purposes. Contribute to YBI-Foundation/Dataset development by creating an account on GitHub. As the number of different values for intendedUse increases, other use-cases will be available. rows/columns of numbers) were distributed, but I was unable to find a Curated list of Publicly available Big Data datasets. The indices are created using a repeat-sales methodology. A long, categorized list of large datasets (available for public use) to try your analytics skills on. Find and fix vulnerabilities Codespaces This post will be a bit different, in that we are looking at the top open dataset repositories that Github has to offer. The script MUST be runnable from a command line interface. CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Various interesting datasets, mostly data from The University of Illinois - wadefagen/datasets. Awesome Public Datasets on GitHub. The dataset was created from the public GitHub dataset on Google BiqQuery. Contribute to datasets/oil-prices development by creating an account on GitHub. GeoIP2 - free IP geolocation database. 40) in connection with the average exchange rate for the pound in New York (up to the end of Data is kept automatically up-to-date by bots that collect and update data once a week. Supports default & custom datasets for applications such as summarization and Q&A. On the other hand, clustering datasets by topic is a good way of measuring diversity. They can be open by any application compatible with CSV files or with a CSV editor. Mosly using Python Faker DataSets helps make data wrangling code more reusable. If you use tfds new to create a new dataset then the dataset class is stored in a file with the exact same name as the dataset, i. In this repository, the overall dataset chart is represented as simplified version. The data also shows the country's population and the number of cases per 100,000 people on a rolling 14 day average. CSV Datasets. 🤗 Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. This repository houses a comprehensive collection of datasets, theoretical frameworks, video materials, and visualizations pertaining to The Synthetic Party, offering insights into the overlap of artificial intelligence and political discourse. Demo apps to showcase Meta Llama3 for CodeSearchNet is a collection of datasets and benchmarks that explore the problem of code retrieval using natural language. View on GitHub Google Trends @GoogleTrends What is Google Trends data? Email us. , 2016) into a format suitable for deep learning research. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. A collection of large datasets containing questions and their answers for use in Natural Language Processing tasks like question answering (QA). Skip to content. Added initial dash-sample-apps data Latest Aug 14, 2020 + 1 release Packages 0. The datasets are provided in the usual sklearn. Sulla base della valutazione dei diversi temi per i dati discussa nell Learn how to download files from GitHub, and understand the difference between downloading, cloning, and forking. For example from your laptop to the cloud, to another user's machine, or to an HPC system. The Python Software Foundation's PyPI dataset can be used to analyze download requests for Python packages. Datasets - Estadísticas Unidad de Análisis Financiero Unidad de Análisis Financiero. CORGIS: The Collection of Really Great, Interesting, Situated Datasets The data set contains daily reports of Covid-19 cases and deaths in countries worldwide. Contribute to datasets/geoip2-ipv4 development by creating an account on GitHub. Datasets is a community library for contemporary NLP designed to support this ecosystem. WIT is composed of a curated set of 37. Skip to main content. You might also want to have a look at Florian Boudin keyphrase extraction repository. In this project, employing our new framework JailbreakHub, we conduct the first measurement The dataset is split into a training set (391K images), a validation set (34k images), and a test set (67k images). Curated open data has 144 repositories available. Datasets are split in 3 categories: Customers, Users and Organizations. Curated list of quality open datasets. Search GitHub Docs Search. Putting data to work means making data more accessible and easier to share. Financial datasets for LLMs 🧪. Follow their code on GitHub. Among various SLAM datasets, we've selected the datasets provide pose and map information. github_nested: Contains a timeline of actions such as pull requests and Natural Scene Text: The images in this type of dataset are usually taken in natural scenes, so the difficulty of this task lies in the complex lighting transformations, shooting angles, blurring, varied fonts, etc. For example, Epi-validated outbreak datasets might be used with a SNP-based or MLST-based workflow. Brent crude and WTI oil prices from US EIA. If you know more datasets, and want to contribute, please, notify me. Click on a CSV name to download it — and let us know what you do with it by emailing us. West Point NSA Data Sets - Snort Intrusion Detection Log. Method of calculation: Since 1 April 1968, calculated from the daily morning fixing; From January 1950 to 21 March 1954, calculated using the Bank of England's gold purchasing price (1 ounce of fine = pound 12. General: 1 ounce of fine gold = 31. Also links to the MAL dataset. Please see the guidelines for workflow name triggers on description; build*: Every push to the master branch or to an open pull request: It runs the data preparation step, and tests and commits a new version of the prepared data if there are any changes: acquire-<acquirer>. Easy access to official spatial data sets of Brazil in R and Python. - niderhoff/big-data-datasets. new_dataset. World Bank Data This data set consists of monthly stock price, dividends, and earnings data and the consumer price index (to allow conversion to real values), all starting January 1871. Its size enables WIT to be used as a pretraining dataset for GitHub is where people build software. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. The public datasets are datasets that BigQuery hosts for you to access and integrate into your applications. ; Document Text: only focues on document images, the difficulty is the variety of typesetting. Otherwise, how you use a dataset is up to you! You signed in with another tab or window. core. The post was inspired by the Github Open Data Showcase, which is good, but which is not very large. Thanks for your contribution to the ML community! Pre-trained Model License. key_value_pair type is a generalized type, where a sample can be one or multiple images with optional text, labeled with key-value Important note: The CARRADA dataset will be downloaded and unzipped in the Docker image. Some example data sets include FCC comments and data Comprehensive country code information, including ISO 3166 codes, ITU dialing codes, ISO 4217 currency codes, and many others - datasets/country-codes CrowdData is an open repository that aggregates the crowdsourced datasets that have individual crowd votes. Welcome to awesome-emg-data, a curated list of Electromyography (EMG) datasets and scholarly publications designed for researchers, practitioners, and enthusiasts in the field of biomedical engineering, neurology, kinesiology, and related disciplines. Objectron is a dataset of short, object-centric video clips. g. Global temperature anomaly data come from the Global Historical Climatology Network-Monthly (GHCN-M) data set and International Comprehensive Ocean-Atmosphere Data Set (ICOADS), which have data from 1880 to the present. Awesome Public Datasets is an open-source dataset that contains topic-centric public data. Common repository for example datasets used by Vega-related projects - vega/vega-datasets nf-core is a collection of high quality Nextflow pipelines. Domain Name Service Logs. The release of the FreebaseQA data set (NAACL 2019). 5 million unique images across 108 Wikipedia languages. Please include the following citation when using tensorflow-datasets for a paper, in addition to any citation specific to the used datasets. use the Google Suggest API as basis for generating questions. We want to make it easy to relocate an algorithm between different data storage environments without code changes. GitHub Docs. Sign in Contribute to OTRF/Security-Datasets development by creating an account on GitHub. naacl question-answering freebase nlp-datasets kb-qa freebaseqa Updated Aug 11, 2022; Pzoom522 / HistSumm Star 69. Each row of the table represents an iris flower, including its species and dimensions of its botanical parts, SHERLOCK - The dataset is essentially a massive time-series dataset spanning nearly every single kind of software and hardware sensor that can be sampled from a Samsung Galaxy S5 smartphone, without root privileges. The vision of developing this repository is to have an exhaustive list of high quality datasets, API's and projects pertaining to COVID-19 which will be open to communities The dataset consists of 22325 bot accounts (among which 18444 are unique), and 107598 users separated into 3 groups: random - random accounts (low social homophily), shift - accounts from communities that pose some activity in VK (medium social homophily), student - verified users who are students of one university (high social homophily). This list states the country names (official short names in English) in alphabetical order as given in ISO 3166-1 and the corresponding ISO 3166-1-alpha-2 code elements. Figure 1: Examples of DeepFashion2. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Resources. ISOLET Data Set - This 38. 624 stars Watchers. The top open dataset repositories on Github include a variety of data, freely available for use by researchers, practitioners, and students alike. Version: Free, Pro, & Team. This research is a continuation of some ideas presented in this blog post and is a joint A public repo of datasets. CSV datasets for ML/AI models from captured network traffic during ZAP scanning with web applications like Django, Flask, React, Vue and Spring - Anti-Nex training datasets Explore and download over 1200 datasets from various R packages and learn how to use them for statistical analysis and visualization. Navigation Menu Toggle navigation. Examples of DeepFashion2 are shown in Figure 1. < > Update on GitHub. data-science anime analysis crawling mal scraped-data kaggle-dataset Updated Jul 6, 2023; Jupyter Notebook; gabrielpreda / Kaggle Star 33. Code List of Datasets Description; Animal Recognition Demo: Contains details on Animal Recognition datasets: Edge Real time Video Analytics: Contains details on Edge Real time Video Analytics datasets: Chat Bot Demo: Contains details on Chat Bot Demo datasets: Redis AI Showcase: Contains details on Redis AI Showcase datasets A Python library for downloading datasets from Kaggle, Google Drive, and other online sources, available on GitHub. (NLVR) A Corpus of Natural Language for Visual Reasoning, 2017 (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 (NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 (SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 (GraphQuestions) On Generating Description: This dataset provides comprehensive agricultural crop data spanning the years 2010 to 2017 for all states across India. Host and manage Contribute to Ayushi0214/Datasets development by creating an account on GitHub. ; Historical Document Text: is usally designed for assisting GitHub is where people build software. As per the home page for Indices on S&P website: The S&P/Case-Shiller U GitHub is where people build software. Following are the datasets and the original papers that proposed them. Binary label. Those who have the resources to collect data lack the incentive to (corollary: often those who have access to a dataset are the GitHub is where people build software. The dataset contains over 600 billion data points in over 10 billion data records. cache/huggingface/datasets; Breaking changes. Download Form, Paper: VITON-HD: A high-resolution virtual try-on dataset with 13,679 image pairs at 1024 x multitask type is a composition type, where one set of images has multiple sets of annotations available for different tasks, where each task can be of any basic type. This is a list of histopathology datasets made public for classification, segmentation, regression and/or registration tasks. 0: A Large-Scale Dataset for Real-World Face Forgery Detection", CVPR 2020: Paper Github "MaskGAN: Towards Diverse and Interactive Facial Image Manipulation", CVPR 2020: Paper Github ISO 3166-1-alpha-2 English country names and code elements. Serie estadística sobre los resultados obtenidos por el servicio en el desempeño de sus funciones de inteligencia financiera, regulación, fiscalización del cumplimiento, normativa y actividades de difusión para la prevención de LA/FT (lavado de activos Awesome Public Datasets on GitHub. Twitch-full contains the full dataset while Twitch-100k is a Tool to convert datasets from "Benchmark Data Sets for Graph Kernels" (K. Contributors 39 + 25 contributors Languages. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. The code is available in this Github repository. Each object is annotated with a 3D bounding box. Contribute to openimages/dataset development by creating an account on GitHub. load_dataset function to download sample datasets This repository contains a copy of machine learning datasets used in tutorials on MachineLearningMastery. yml file under the corresponding created Elenco Basi di Dati Chiave: Questo documento rappresenta il risultato dell’azione «Individuazione delle basi di dati chiave» definita nell’ambito degli Open Data del Piano Triennale per l’Informatica nella PA (2017-2019). Each line represents a C/C++ code unit (file, function, or class) and is associated with a binary label (non HierText is the first dataset featuring hierarchical annotations of text in natural scenes and documents. Open Search Bar Close Search Bar. Web Attack Payloads - A collection of web attack payloads. py. Contribute to datasciencedojo/datasets development by creating an account on GitHub. We aim at providing the available datasets with a standard format (explained in Download section below) so that they can be directly used in experiments, without any work-load in preprocessing. ; Machine-Learning-driven-Web-Application-Firewall - Set of good and bad queries to a web application firewall. This allows customers to better understand how secure their SharePoint is, maintain information There is a field intendedUse which suggests how a particular dataset might be used. ) when available from This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang. Natural Scene Text: The images in this type of dataset are usually taken in natural scenes, so the difficulty of this task lies in the complex lighting transformations, shooting angles, blurring, varied fonts, etc. The Security Datasets project is an open-source initiatve that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research. 73 watching Forks. Core meta for awesome-public-datasets. Ideally, I would like to make a list of the top open datasets on Github, period; however, this gets tricky, since searching for "open data," Datasets, Transforms and Models specific to Computer Vision - pytorch/vision. Sign in Product GitHub community articles Repositories. Find and fix Contribute to ajaykuma/Datasets_For_Work development by creating an account on GitHub. 6 million entity rich image-text examples with 11. deep-learning dataset-creation graphical-models dataset-generation deep-learning-datasets graph-embedding graph-classification deep-learning-dataset "DeeperForensics-1. Sampled Wikipedia passages are provided to an LLM (PaLM-2) using the novel summarize-then-ask prompting (SAP) method. AI-powered developer platform For our purposes, creating a custom TFDS dataset means writing a new class which inherits from tfds. I made a good faith effort to determine the license under which the actual data (i. A repository of pretty cool datasets that I collected for network science and machine learning research. All of these recommendation datasets can convert to the atomic files defined in RecBole, which is a unified, comprehensive and efficient recommendation library. GitHub community articles Repositories. They start with a The GitHub Code dataset consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in 1TB of data. Even when there are several social media platforms to get data from, the construction of a balanced labeled Case-Shiller Index of US residential house prices. PurpleAir Air Quality Data - Developer API for accessing purple air For example, Github's GH Archive dataset can be used to analyze public events on GitHub, such as pull requests, repository stars, and opened issues. Dataset Download Link Code; VGTSR (Baidu) Download: pfsm: VGTSR (Google Drive) Download- (NLVR) A Corpus of Natural Language for Visual Reasoning, 2017 (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2016 (NewsQA) NewsQA: A Machine Comprehension Dataset, 2016 (SQuAD) SQuAD: 100,000+ Questions for Machine Comprehension of Text, 2016 (GraphQuestions) On Generating Dataset Name Link Description (Optional) ESOL: Lipophilicity: SIDER: OFFSIDES: HIA: absorption. Users can find data analysis, libraries, and guides, all open source. Normally accessed via HTTP or Websockets. The price, dividend, and earnings series are from the same sources as described in Chapter 26 of my earlier book (Market Volatility [Cambridge, MA: MIT Press, 1989]), although The Synthetic Party (Det Syntetiske Parti) stands as the world's first AI-driven political party. Subject: Coverage: From: To: title date ir_datasets is a python package that provides a common interface to many IR ad-hoc ranking benchmarks, training datasets, etc. Custom properties. How to use it The GitHub Code dataset is a very large dataset so for most use cases it is recommended to make use of the A public dataset is any dataset that is stored in BigQuery and made available to the general public through the Google Cloud Public Dataset Program. EPIC-KITCHENS: Scaling Egocentric Vision: The To accompany the presentation of the VTAB+MD paper at NeurIPS 2021's Datasets and Benchmarks track, we are releasing a TensorFlow Datasets-based implementation of Meta-Dataset's input pipeline which is compatible with both the original Meta-Dataset protocol (MD-v1) and the updated protocol designed for VTAB+MD (MD-v2). al. This list is updated whenever a change to the official code list in ISO 3166-1 is effected by the ISO 3166/MA. Log Server Aggregate Log. The passages are then provided to PaLM-2 along with a prompt that asks the model to summarize the Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. The SWIM-IR dataset is generated by first sampling passages from Wikipedia. ; Internet-Wide Scan Data Repository - The Censys The Open Images dataset. In each video, the camera moves around and above the object and captures it from different views. Get started / Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples Eleni Triantafillou, Tyler Zhu, Vincent Dumoulin, Pascal Lamblin, Utku Evci, Kelvin Xu, Ross Goroshin, Carles Gelada, Kevin Swersky, Pierre-Antoine Manzagol, Hugo Larochelle Datasets. 2M), line, and paragraph level annotations. Congratulations! You used BigQuery and SQL to query the GitHub public dataset. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. The package takes care of downloading datasets (including documents, queries, relevance judgments, etc. AI-powered developer platform Information oversharing is a security and compliance use case powered by our newly available SharePoint datasets. or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Kersting et al. AI-powered developer Generate a dataset; Under the corresponding MITRE Technique ID folder create a folder named after the tool the dataset comes from, for example: atomic_red_Team Make PR with <tool_name_yaml>. Contribute to datasets/awesome-data development by creating an account on GitHub. Collected and sorted from various blogs, answers, and user feedback, it combines free and paid data sets on physics, sports, software, natural language, and machine learning. python r brazil geopackage rstats shapefile sf spatial-data datasets geopandas Updated Jul 31, 2024; R; Sample data. Flexible Data Ingestion. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. com. Datasets, Transforms and Models specific to Computer Vision - pytorch/vision. Readme License. Reload to refresh your session. - benedekrozemberczki/datasets. 6. Contribute to mwaskom/seaborn-data development by creating an account on GitHub. Contribute new data here! data-science open-data public-data awesome-public-datasets Updated Sep 6, 2024; Contribute to datasets/covid-19 development by creating an account on GitHub. Text lines are defined as connected sequences of words that are aligned in COVID-19 - Datasets, APIs and List of Projects. The datasets are generated using random values. Search our open datasets and learn how other researchers have used them to solve societal challenges. This repository is the collection of SLAM-related datasets. Create full-fledged APIs for slowly moving datasets without writing a single line of code. Datasets and resources listed here should all be openly-accessible for research purposes, requiring, at most, registration This is the "Iris" dataset. Voice Gender Detection - GitHub repo for Voice gender detection using the VoxCeleb dataset (7000+ unique speakers and utterances, 3683 males / 2312 females). e. Host and manage packages This repository exists only to provide a convenient target for the seaborn. Our goal is to facilitate the discovery and accessibility of high-quality EMG data and cutting More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects. Learn more about the tools, frameworks and templates available to enable more secure and streamlined access to data. Charades-Ego: Actor and Observer: Joint Modeling of First and Third-Person Videos (CVPR 2018) [][112 people, 4000 paired videos, 157 action classes. The principle for nf-core test data is as small as possible, as large as necessary. I think it helps to have Dataset and Evaluation toolbox: GitHub: Google Drive link: Google Drive: Remote Sensing Datasets. And vote on the current set of requests by adding a thumbs-up reaction to the issue. Uncompressed size in brackets. You signed out in another tab or window. 100DOH: Understanding Human Hands in Contact at Internet Scale (CVPR 2020) [][]131 days of footage, 100K annotated hand-contact video frames. Which one would you pick? By Anmol Rajpurohit on April 6, 2015 in Datasets, Finance, GitHub, Government, Machine Learning, NLP, Open Data, Time series data. ; Historical Document Text: is usally designed for assisting GitHub: DressCode: A dataset focused on modeling the underlying 3D geometry and appearance of a person and their garments given a few or a single image. This github boasts a variety of datasets such as Climate Data, Time Series data, Plane crash data etc. No packages published . A large social network of GitHub developers which was collected from the public API in June 2019. yml: Schedule: It runs the acquirer and commits the acquired data to the corresponding raw location This list is inspired by awesome public datasets, but for real-time datasets and sources. The allocated space to an image can be modified using the option --memory="100g". BuzzFeed also makes data available to the public through its GitHub page. Curated open data has 144 repositories available. Host and manage packages Security. Data Information on S&P 500 index used to be available on the official webpage on the Standard and Poor's website but until they publish it back, Wikipedia's [SP500 list of companies][sp-list] is the best up-to-date and open data source. Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Automate any workflow Packages. Contribute to virattt/financial-datasets development by creating an account on GitHub. Sign in Product Actions. This repository contains various files for CI and unit testing of nf-core pipelines and infrastructure. Select language: current language is English. Open Menu. This repository is linked to the google site. I am happy if you want to help me update and/or improve this document. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as Datasets on Github It hosts tons of awesome datasets. Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems. kpvrp ngoc ooqmdna xwlll mpp ogfgo ojpoq ydynr tda wsfo