Open Source Data Quality and Profiling

By Author: arrah , arunwizz
This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy.

This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic.

It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler"

Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/

apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/

Feature

  • Mysql, Oracle,Postgres,Access,Db2,SQL Server certified Big data support - HIVE
  • Create Hive table, Profile Hive table, Move file to/from Profiler System and Hadoop Grid
  • Fuzzy Logic based similarity check, Cardinailty check between tables and files
  • Export and import from XML, XLS or CSV format, PDF export
  • File Analysis, Regex search, Standardization, DB search
  • Complete DB Scan, SQL interface, Data Dictionary, Schema Comparison
  • Statistical Analysis, Reporting ( dimension and measure based), Ad Hoc reports and Analytics
  • Pattern Matching , DeDuplication, Case matching, Basket Analysis, Distribution Chart
  • Data generation, Data Preparation and Data masking features
  • Meta Data Information, Reverse engineering of Data Model
  • Timeliness analysis , String length analysis, KMean, Prediction, Regression
  • Address Correction, Single View of Customer, Product, Golden merge for records
  • Record Match, Linkage and Merge added based on fuzzy logic
  • Format Creation, Format Matching ( Phone, Date, String and Number), Format standardization
  • Data Preparation: Ordinal,Normalization,Bucketing,Regression

SmartPOS Advanced Point of Sale 100% Web

SmartPOS is a complete new OSGi plug-in that works inside SmartERP. (Idempiere 2.1 Distro) , taking all the power of an ERP, but creating an intuitive , agile and easy to learn Point of Sale (POS) 100% Web . SmartPOS has been designed to work as part of the ERP, and support complex business en.........

Similar: 13%

Toolsverse ETL Framework

ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications. Key Features: * embeddable, open source and free * fast and scalable * uses target database features to do transformatio.........

Similar: 13%

Sample Tracking

Help us to improve Freezer Web Access. We want to hear your feedback! Request new Freezer Web Access feature or module and receive a free Single User version with new feature. https://www.atgclabs.com/products/fw Freezer Web Access is a user friendly program designed to assist researchers with est.........

Similar: 12%

BIRT Report Designer

BIRT is an open source technology platform used to create data visualizations and reports that can be embedded into rich client and web applications. Developers who use BIRT Designer are able to access information from multiple data sources easily and quickly in order to create reports and applicati.........

Similar: 11%

XML Editor/Validator/Designer with CAMV

The CAM editor is the leading open source XML Editor/Validation/Schema toolset for rapidly building / deploying XML /JSON /Hibernate /SQL data /Forms applications. Visual WYSIWYG data design, rule entry wizards + drag & drop dictionary components. Will import, analyze / refactor from XML Schema / JS.........

Similar: 11%

Lab Storage

Help us to improve Freezer Web Access. We want to hear your feedback! Request new Freezer Web Access feature or module and receive a free Single User version with new feature. https://www.atgclabs.com/products/fw Freezer Web Access is a user friendly program designed to assist researchers with est.........

Similar: 11%

Lab Inventory

Request new Lab Inventory feature or module and receive a free Single User version with new feature. Help us to improve Lab Inventory. We want to hear your feedback! https://www.atgclabs.com/products/li The Lab Inventory System is an innovative, easy to learn solution for research laboratories. You.........

Similar: 10%

OpenUnderwriter (Insurance Distribution)

OpenUnderwriter is an open source software house specialising in the development of IT solutions for the insurance market. Specialists in the areas of eBusiness and component based development, the team has developed technology for a number of major insurance companies. The OpenUnderwriter platfor.........

Similar: 10%

torotools: Social DMS HRMS Time Tracking

torotools.es is a software suite of responsive design web products for managing your company´s knowledge and talent. Based on Material Design. toro ECM tool is a free web based Enterprise Content Management, designed to help your company to improve the creation and management of information. It is .........

Similar: 9%

Lioness (Languages Interop Framework)

Framework for making Windows applications that are one .exe file in AutoHotKey_L,C++,C#, VB.NET,Java,Groovy,Common Lisp,Nemerle,Ruby,Python,PHP,Lua,Tcl,Perl,Jint,S#,WSH VBScript,HTML/JavaScript/CSS,COM, PowerShell without compiling . For .NET 4....

Similar: 7%