Logo of FileMarket Storefront Contact Us
Back

FileMarket | Text Recognition Data | 50,000 Images | Computer Vision Data | AI Model Training Data | Textual data | Annotated Imagery Data

Pre-collected OCR datasets include images of natural scenes, handwritten texts, bills and documents, and test papers. The AI training data spans 20 languages, various natural environments, and diverse photographic angles.

Request Information
ImageID Environment Language Angle Device ImageFormat AnnotationFormat
xxxxxxx xxxxxxxxx xxxxxxx xxxxxxxxx xxxxxxxxx xxxx xxxxx
xxxxxxx xxxxxxxxxx xxxxxxx xxxxxxxxxx xxxxxx xxxx xxxxx
xxxxxxx xxxxxx xxxxxx xxxxxxxxxxxx xxxxxxxxx xxxx xxxxx
xxxxxxx xxxxxx xxxxxx xxxxxxxxx xxxxxx xxxx xxxxx
xxxxxxx xxxxxxxxx xxxxxxx xxxxxxxxxx xxxxxxxxx xxxx xxxxx
xxxxxxx xxxxx xxxxxxxx xxxxxxxxxxxx xxxxxx xxxx xxxxx
xxxxxxx xxxxxxxxxxxxx xxxxxx xxxxxxxxx xxxxxxxxx xxxx xxxxx
xxxxxxx xxxxxx xxxxxx xxxxxxxxxx xxxxxx xxxx xxxxx
xxxxxxx xxxxxxx xxxxx xxxxxxxxxxxx xxxxxxxxx xxxx xxxxx
xxxxxxx xxxx xxxxxxx xxxxxxxxx xxxxxx xxxx xxxxx
Request Sample Access

Description

Annotated Imagery Data FileMarket provides a robust Annotated Imagery Data set designed to meet the diverse needs of various computer vision and machine learning tasks. This dataset is part of our extensive offerings, which also include Textual Data, Object Detection Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each category is meticulously crafted to ensure high-quality and comprehensive datasets that empower AI development. Specifications: Data Size: 50,000 images Collection Environment: The images cover a wide array of real-world scenarios, including shop signs, stop boards, posters, tickets, road signs, comics, cover pictures, prompts/reminders, warnings, packaging instructions, menus, building signs, and more. Diversity: The dataset spans 5 languages and includes images from various natural scenes captured at multiple photographic angles (looking up, looking down, eye-level). Devices Used: Images are captured using cellphones and cameras, reflecting real-world usage. Image Parameters: All images are provided in .jpg format, and the corresponding annotation files are in .json format. Annotation Details: The dataset includes line-level quadrilateral bounding box annotations and text transcriptions. Accuracy: The error margin for each vertex of the quadrilateral bounding box is within 5 pixels, ensuring bounding box accuracy of at least 97%. The text transcription accuracy also meets or exceeds 97%. Unique Data Collection Method: FileMarket utilizes a community-driven approach to collect data, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that our datasets are diverse, real-world applicable, and ethically sourced, with full participant consent. This approach allows us to provide datasets that are both comprehensive and reflective of real-world scenarios, ensuring that your AI models are trained on the most relevant and diverse data available. By integrating our unique data collection method with the specialized categories we offer, FileMarket is committed to providing high-quality data solutions that support and enhance your AI and machine learning projects.

Country Coverage

(160 countries)
Africa (58)
Asia (50)
Europe (52)

Data Categories

  • Annotated Imagery Data
  • Deep Learning (DL) Data
  • Textual data
  • Large Language Model (LLM) Data
  • Object Detection Data

Pricing

Pricing available upon request

Volumes

images
50K

Does this product fit your data needs?

Get in touch with our team to start unlocking your data solutions.

Request Information