Data Modeling Design

Big Data Glossary by Pete Warden

By Pete Warden

To assist you navigate the big variety of new information instruments on hand, this advisor describes 60 of the newest strategies, from NoSQL databases and MapReduce techniques to computing device studying and visualization instruments. Descriptions are in keeping with first-hand event with those instruments in a creation environment.

This convenient thesaurus additionally encompasses a bankruptcy of keywords that support outline a lot of those instrument categories:

  • NoSQL Databases—Document-oriented databases utilizing a key/value interface instead of SQL
  • MapReduce—Tools that aid allotted computing on huge datasets
  • Storage—Technologies for storing facts in a disbursed means
  • Servers—Ways to hire computing energy on distant machines
  • Processing—Tools for extracting necessary info from huge datasets
  • Natural Language Processing—Methods for extracting details from human-created textual content
  • Machine Learning—Tools that immediately practice info analyses, in response to result of a one-off research
  • Visualization—Applications that current significant facts graphically
  • Acquisition—Techniques for cleansing up messy public info assets
  • Serialization—Methods to transform facts constitution or item kingdom right into a storable structure

Show description

Read or Download Big Data Glossary PDF

Best data modeling & design books

Distributed Object-Oriented Data-Systems Design

This consultant illustrates what constitutes a sophisticated allotted details approach, and the way to layout and enforce one. the writer offers the foremost components of a complicated disbursed info method: an information administration process helping many periods of knowledge; a dispensed (networked) surroundings helping LANs or WANS with a number of database servers; a complicated person interface.

Modeling and Data Mining in Blogosphere (Synthesis Lectures on Data Mining and Knowledge Discovery)

This ebook deals a complete review of many of the thoughts and learn matters approximately blogs or weblogs. It introduces innovations and techniques, instruments and functions, and overview methodologies with examples and case stories. Blogs let humans to specific their suggestions, voice their evaluations, and percentage their reports and concepts.

Morphological Modeling of Terrains and Volume Data

This e-book describes the mathematical heritage at the back of discrete ways to morphological research of scalar fields, with a spotlight on Morse conception and at the discrete theories because of Banchoff and Forman. The algorithms and information buildings offered are used for terrain modeling and research, molecular form research, and for research or visualization of sensor and simulation 3D info units.

Object-Role Modeling Fundamentals: A Practical Guide to Data Modeling with ORM

Object-Role Modeling (ORM) is a fact-based method of information modeling that expresses the data necessities of any company area easily by way of gadgets that play roles in relationships. All evidence of curiosity are handled as situations of attribute-free buildings referred to as truth forms, the place the connection might be unary (e.

Extra resources for Big Data Glossary

Example text

The command-line interface allows you to apply exactly the same code in an automated way for production. Mahout Mahout is an open source framework that can run common machine learning algorithms on massive datasets. To achieve that scalability, most of the code is written as parallelizable jobs on top of Hadoop. It comes with algorithms to perform a lot of common tasks, like clustering and classifying objects into groups, recommending items based on other users’ behaviors, and spotting attributes that occur together a lot.

For example, it’s easy to spot and correct common problems like typos or inconsistencies in text values and to change cells from one format to another. There’s also rich support for linking data by calling APIs with the data contained in existing rows to augment the spreadsheet with information from external sources. Refine doesn’t let you do anything you can’t with other tools, but its power comes from how well it supports a typical extract and transform workflow. It feels like a good step up in abstraction, packaging processes that would typically take multiple steps in a scripting language or spreadsheet package into single operations with sensible defaults.

If you’re representing a list of objects mapping keys to values, the most intuitive way would be to use an indexed array of associative arrays. This means that the string for each key is stored inside each object, which involves a large number of duplicated strings when the number of unique keys is small compared to the number of values. There are manual ways around this, of course, especially as the textual representations usually compress well, but many of the other serialization approaches I’ll talk about try to combine the flexibility of JSON with a storage mechanism that’s more space efficient.

Download PDF sample

Rated 4.12 of 5 – based on 33 votes