Big data technology frameworks and products are emerging one after another, but each has its own characteristics. As the most commonly used open source framework, the following is the most comprehensive summary and comparative analysis of Apache Hadoop, Spark and Flink:
在数据仓库领域,有两位大师,一位是“数据仓库”之父 Bill Inmon,一位是数据仓库权威专家 Ralph Kimball,两位大师每人都有一本经典著作,Inmon 大师著作《数据仓库》及 Kimball 大师的《数仓工具箱》,两本书也代表了两种不同的数仓建设模式,这两种架构模式支撑了数据仓库以及商业智能近二十年的发展。今天我们就来聊……
数据分层是数据仓库设计中十分重要的一个环节,优秀的分层设计能够让整个数据体系更易理解和使用。
而目前网络中大部分可以被检索到相关文章只是简单地提及数据分层的设计,或缺少明确而详细的说明,或缺少可落地实施的方案,或缺少具体的示例说明。
因此,本文将指出一种通用的数据仓库分层方法,具体包含如下内容:
介……
1、数据库设计优化
A. 对查询进行优化,应尽量避免全表扫描,首先应考虑在 where 及 order by 涉及的列上建立索引。
B. 应尽量避免在 where 子句中对字段进行 null 值判断,否则将导致引擎放弃使用索引而进行全表扫描,如:
select id from t where num is null
可以在num上设置默认值0,确保表中num列没有null值,然后这……
不良数据是一切数字化转型的痛点,每一位数据专业人员都需要对其一查到底。不良数据会影响对数据的正确解读,并最终导致决策失误。因此,识别企业中的不良数据至关重要,但不难预见,此举并非易事。
识别不良数据
不良数据可能来自企业的各个领域,包括销售、市场营销或工程等业务部门,并呈现出不同形式。让我们来看一……
Data Handling Ethics:
Data Governance and Stewardship:
Data Architecture:
Data Modeling and Design:
Data Storage and Operations:
Data Security:
Data Integration and Interoperability:
Document and Content Management:
Reference and Master Data:
Data Warehousing and Business Intelligence:
Me……
The term “Data Lake”, “Data Warehouse” and “Data Mart” are often times used interchangbly. But what are exactly the differences between these things? This post attempts to help explain the similarity, the difference and when to use each.
A high-level comparis……
A data mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse.
A data mart is a condensed version of Data Warehouse and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR……
摘要:当前研发工作中经常出现因数据库表、数据库表字段格式不规则而影响开发进度的问题,在后续开发使用原来数据库表时,也会因为数据库表的可读性不够高,表字段规则不统一,造成数据查询,数据使用效率低的问题,所以有必要整理出一套合适的数据库表字段命名规范来解决优化这些问题。
本文是一篇包含了数据库命名、数……
Codd’s 12 Rules (for a relational database product) are (still) frequently informally cited, but their original text turns out to be annoyingly difficult to find. They are reprinted here under the principles of fair use and/or fair dealing and have been extracted, verbatim, from ‘Is your DBMS real……