product description page
Pro Hadoop Data Analytics : Designing and Building Big Data Systems Using the Hadoop Ecosystem
about this item
Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and recommendation. In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A git contribution will be provided as an end-to-end example of the techniques described in the book.
The book emphasizes four important topics:
- The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. Deep-dive topics will include Spark, H20, Vopal Wabbit (NLP), Stanford NLP, and other appropriate toolkits and plugins.
- Best practices and structured design principles. This will include strategic topics as well as the how to example portions.
- The importance of mix and match or hybrid systems to accomplish application goals. The hybrid approach will be prominent in the example deep dives.
- Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system.
A complete example system will be developed using standard third-party components, to be submitted to git, which will consist of the toolkits, libraries, visualization and reporting code, and support glue to provide a working and extensible end-to-end system.
Effective data analytics, particularly when the data is complex, high-volume, or unstructured, is particularly challenging. Distributed solutions have recently become available—but the ability to build end-to-end analytical systems using Hadoop and its ecosystem has remained extremely challenging.
What You'll Learn
- The what, why, and how of building big data analytic systems with the Hadoop ecosystem
- Libraries, toolkits, and algorithms to make development easier and more effective
- Best practices to use when building analytic systems with Hadoop, and metrics to measure performance and efficiency of components and systems
- How to connect to standard relational databases, noSQL data sources, and more
- Useful case studies and example components which assist you in creating your own systems