Big data technology frameworks and products are emerging one after another, but each has its own characteristics. As the most commonly used open source framework, the following is the most comprehensive summary and comparative analysis of Apache Hadoop, Spark and Flink:
在数据仓库领域,有两位大师,一位是“数据仓库”之父 Bill Inmon,一位是数据仓库权威专家 Ralph Kimball,两位大师每人都有一本经典著作,Inmon 大师著作《数据仓库》及 Kimball 大师的《数仓工具箱》,两本书也代表了两种不同的数仓建设模式,这两种架构模式支撑了数据仓库以及商业智能近二十年的发展。今天我们就来聊……
A. 对查询进行优化,应尽量避免全表扫描,首先应考虑在 where 及 order by 涉及的列上建立索引。
B. 应尽量避免在 where 子句中对字段进行 null 值判断,否则将导致引擎放弃使用索引而进行全表扫描,如:
select id from t where num is null
Data Handling Ethics:
Data Governance and Stewardship:
Data Architecture:
Data Modeling and Design:
Data Storage and Operations:
Data Security:
Data Integration and Interoperability:
Document and Content Management:
Reference and Master Data:
Data Warehousing and Business Intelligence:
The term “Data Lake”, “Data Warehouse” and “Data Mart” are often times used interchangbly. But what are exactly the differences between these things? This post attempts to help explain the similarity, the difference and when to use each.
A high-level comparis……
A data mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse.
A data mart is a condensed version of Data Warehouse and is designed for use by a specific department, unit or set of users in an organization. E.g., Marketing, Sales, HR……
Codd’s 12 Rules (for a relational database product) are (still) frequently informally cited, but their original text turns out to be annoyingly difficult to find. They are reprinted here under the principles of fair use and/or fair dealing and have been extracted, verbatim, from ‘Is your DBMS real……