
Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows with Prime
Try Prime
and start saving today with fast, free delivery
Amazon Prime includes:
Fast, FREE Delivery is available to Prime members. To join, select "Try Amazon Prime and start saving today with Fast, FREE Delivery" below the Add to Cart button.
Amazon Prime members enjoy:- Cardmembers earn 5% Back at Amazon.com with a Prime Credit Card.
- Unlimited Free Two-Day Delivery
- Streaming of thousands of movies and TV shows with limited ads on Prime Video.
- A Kindle book to borrow for free each month - with no due dates
- Listen to over 2 million songs and hundreds of playlists
- Unlimited photo storage with anywhere access
Important: Your credit card will NOT be charged when you start your free trial or if you cancel during the trial period. If you're happy with Amazon Prime, do nothing. At the end of the free trial, your membership will automatically upgrade to a monthly membership.
Buy new:
$45.07$45.07
Ships from: Amazon.com Sold by: Amazon.com
Save with Used - Good
$42.73$42.73
Ships from: Amazon Sold by: Dream Books Co.

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Image Unavailable
Color:
-
-
-
- To view this video download Flash Player
Follow the author
OK
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 1st Edition
Purchase options and add-ons
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
- Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
- Make informed decisions by identifying the strengths and weaknesses of different tools
- Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
- Understand the distributed systems research upon which modern databases are built
- Peek behind the scenes of major online services, and learn from their architectures
- ISBN-101449373321
- ISBN-13978-1449373320
- Edition1st
- PublisherO'Reilly Media
- Publication dateApril 2, 2017
- LanguageEnglish
- Dimensions7.01 x 1.24 x 9.17 inches
- Print length614 pages

Explore your book, then jump right back to where you left off with Page Flip.
View high quality images that let you zoom in to take a closer look.
Enjoy features only possible in digital – start reading right away, carry your library with you, adjust the font, create shareable notes and highlights, and more.
Discover additional details about the events, people, and places in your book, with Wikipedia integration.
Frequently bought together

More items to explore
- Scalability is the term we use to describe a system’s ability to cope with increased load.Highlighted by 3,058 Kindle readers
- However, if your application does use many-to-many relationships, the document model becomes less appealing.Highlighted by 2,857 Kindle readers
- Set up detailed and clear monitoring, such as performance metrics and error rates.Highlighted by 2,634 Kindle readers
From the brand

-
Databases, data science & more
-
Data Science
-
Data Visualization
-
Databases
-
Streaming
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher

Who Should Read This Book?
If you develop applications that have some kind of server/backend for storing or processing data, and your applications use the internet (e.g., web applications, mobile apps, or internet-connected sensors), then this book is for you.
This book is for software engineers, software architects, and technical managers who love to code. It is especially relevant if you need to make decisions about the architecture of the systems you work on—for example, if you need to choose tools for solving a given problem and figure out how best to apply them. But even if you have no choice over your tools, this book will help you better understand their strengths and weaknesses.
You should have some experience building web-based applications or network services, and you should be familiar with relational databases and SQL. Any non-relational databases and other data-related tools you know are nice, but not required.
A general understanding of common network protocols like TCP and HTTP is helpful. Your choice of programming language or framework makes no difference for this book.
If any of the following are true for you, you’ll find this book valuable:
- You want to learn how to make data systems scalable, for example, to support web or mobile apps with millions of users.
- You need to make applications highly available (minimizing downtime) and operationally robust.
- You are looking for ways of making systems easier to maintain in the long run, even as they grow and as requirements and technologies change.
- You have a natural curiosity for the way things work and want to know what goes on inside major websites and online services. This book breaks down the internals of various databases and data processing systems, and it’s great fun to explore the bright thinking that went into their design.

Sometimes, when discussing scalable data systems, people make comments along the lines of, 'You’re not Google or Amazon. Stop worrying about scale and just use a relational database'. There is truth in that statement: building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses. As we shall see, relational databases are important but not the final word on dealing with data.
Scope of This Book
This book does not attempt to give detailed instructions on how to install or use specific software packages or APIs, since there is already plenty of documentation for those things. Instead we discuss the various principles and trade-offs that are fundamental to data systems, and we explore the different design decisions taken by different products.
We look primarily at the architecture of data systems and the ways they are integrated into data-intensive applications. This book doesn’t have space to cover deployment, operations, security, management, and other areas—those are complex and important topics, and we wouldn’t do them justice by making them superficial side notes in this book. They deserve books of their own.
Many of the technologies described in this book fall within the realm of the Big Data buzzword. However, the term 'Big Data' is so overused and underdefined that it is not useful in a serious engineering discussion. This book uses less ambiguous terms, such as single-node versus distributed systems, or online/interactive versus offline/batch processing systems.
This book has a bias toward free and open source software (FOSS), because reading, modifying, and executing source code is a great way to understand how something works in detail. Open platforms also reduce the risk of vendor lock-in. However, where appropriate, we also discuss proprietary software (closed-source software, software as a service, or companies’ in-house software that is only described in literature but not released publicly).
Editorial Reviews
About the Author
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.
Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.
Product details
- Publisher : O'Reilly Media; 1st edition (April 2, 2017)
- Language : English
- Paperback : 614 pages
- ISBN-10 : 1449373321
- ISBN-13 : 978-1449373320
- Item Weight : 2.13 pounds
- Dimensions : 7.01 x 1.24 x 9.17 inches
- Best Sellers Rank: #2,972 in Books (See Top 100 in Books)
- #1 in Data Modeling & Design (Books)
- #2 in Computer Science (Books)
- #3 in Computer Software (Books)
- Customer Reviews:
Videos
About the author

Martin Kleppmann is a researcher in distributed systems and security at the University of Cambridge, and author of Designing Data-Intensive Applications (O'Reilly Media, 2017). Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. He is now working on TRVE DATA, a project that aims to bring end-to-end encryption and decentralisation to a wide range of applications.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Customers find the book provides fundamental concepts and hundreds of valuable references, making it well-written and detailed. Moreover, the content is packed with information, and customers appreciate its coverage of distributed systems. However, the book receives mixed feedback regarding data storage, with some praising its conceptual overview while others find it superficial. Additionally, several customers report missing pages in their copies.
AI-generated from the text of customer reviews
Select to learn more
Customers find the book well-written and easy to understand, with one customer noting how the author simplifies complex concepts.
"...It teaches you to read between the lines on how certain technologies work so that you can identify the pros and cons early and without needing them..." Read more
"...In addition to the author's abundant and effective simple line diagrams that are reminiscent (although more sophisticated) of his earlier diagrams,..." Read more
"...Written in understandable and easy words, the flow of the content is completely straightforward and relevant to each other...." Read more
"...that form the formal foundations of the subject, But it is so clearly written, and accessible, that you could go a very long way without needing to..." Read more
Customers praise the book's thorough presentation of topics and hundreds of valuable references, with one customer noting how it masterfully summarizes 30+ years of theory.
"...Along those same lines it is excellent at circling back to concepts introduced at prior points in the book...." Read more
"...In my opinion, the last chapter is probably the most abstract simply because it explores ideas about how the tools covered in the prior two chapters..." Read more
"...the flow of the content is completely straightforward and relevant to each other. The content is managed due to dependencies respectively" Read more
"...It has all the references to the classic papers and textbooks that form the formal foundations of the subject, But it is so clearly written, and..." Read more
Customers love the content of the book and find it a pleasure to read, with one customer noting it's a must-read for both beginners and professionals.
"...All 600ish pages are worth reading, and it's presented in an excellent, engaging way with real world practical examples for everything." Read more
"Love this book, helped to prepare for system design reviews for Google, Amazon and Facebook" Read more
"...fits into the bigger landscape (and why), this book is one of the best overviews I've ever read, for any discipline...." Read more
"...Mr Kleppmann does a great job of articulating the "systems" aspects of data engineering...." Read more
Customers find the book worth the purchase and great value for money.
"...Definitely worth the purchase unless your career has already been full of replication, partitioning, and distributed data systems...." Read more
"...etc . Also, the maps alone are worth the price of the book :)..." Read more
"...Well worth picking up." Read more
"...This book is worth every penny." Read more
Customers appreciate the content of the book, which provides a great overview and is packed with information, with one customer noting it includes references and articles for each topic.
"...In addition to the author's abundant and effective simple line diagrams that are reminiscent (although more sophisticated) of his earlier diagrams,..." Read more
"...Martin Kleppmann provided us with a fantastic, high-level navigational map through the many seas and lands of the databases and he gave us the tools..." Read more
"...The book deals with all the stuff that happens around data engineering : storage, models, structures, access patterns, encoding, replication,..." Read more
"...is only one subset of computer science but this book and knowledge base is a great resource for getting into the world of systems that many large..." Read more
Customers have mixed opinions about the book's coverage of data storage, with some praising it as a fantastic conceptual overview while others find it superficial.
"...maintainable applications, data models and query languages, storage and retrieval, and encoding and evolution, (2) distributed data, which covers..." Read more
"...would be fine. But it goes into painfully detailed description of this minor topic." Read more
"...foundational aspects of both distributed computing and data storage in a chorent manner...." Read more
"...The chapters on partitioning data and the “trouble with distributed systems” were good though...." Read more
Reviews with images

Excelente libro si tienes experiencia y tambien sino
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on June 1, 2020Designing Data-Intensive Applications really exceeded my expectations. Even if you are experienced in this area this book will re-enforce things you know (or sort of know) and bring to light new ways of thinking about solving distributed systems and data problems. It will give you a solid understanding of how to choose the right tech for different use cases.
The book really pulls you in with an intro that is more high level, but mentions problems and solutions that really anyone who has worked on these types of applications have either encountered or heard mention of. The promise it makes is to take these issues such as scalability, maintainability and durability and explain how to decide on the right solutions to these issues for the problems you are solving. It does an amazing job of that throughout the book.
This book covers a lot, but at the same time it knows exactly when to go deep on a subject. Right when it seems like it may be going too deep on things like how different types of databases are implemented (SSTables, B-trees, etc.) or on comparing different consensus algorithms, it is quick to point out how and why those things are important to practical real-world problems and how understanding those things is actually vital to the success of a system.
Along those same lines it is excellent at circling back to concepts introduced at prior points in the book. For example the book goes into how log based storage is used for some databases as their core way of storing data and for durability in other cases. Later in the book when getting into different message/eventing systems such as Kafka and ActiveMQ things swing back to how these systems utilize log based storage in similar ways. Even if you have prior knowledge or even have worked with these technologies, how and why they work and the pros and cons of each become crystal clear and really solidified. Same can be said of it's great explanations of things like ZooKeeper and why specific solutions like Kafka make use of it.
This book is also amazing at shedding light on the fact that so little of what is out there is totally new, it attempts to go back as far as it can at times on where a certain technology's ideas originated (back to the 1800s at some points!). Bringing in this history really gives a lot of context around the original problems that were being solved, which in turn helps understanding pros and cons. One example is the way it goes through the history of batch processing systems and HDFS. The author starts with MapReduce and relating it to tech that was developed decades before. This really clarifies how we got from batch processing systems on proprietary hardware to things like MapReduce on commodity hardware thanks in part to HDFS, eventually to stream based processing. It also does great at explaining the pros and cons of each and when one might choose one technology over the other.
That's really the theme of this book, teaching the reader how to compare and contrast different technologies for solving distributed systems and data problems. It teaches you to read between the lines on how certain technologies work so that you can identify the pros and cons early and without needing them to be spelled out by the authors of those technologies. When thinking about databases it teaches you to really consider the durability/scalability model and how things are no where near black and white between "consistent" vs "eventually consistent", these is a ton of nuance there and it goes deep on things like single vs multi leader vs leaderless, linearizability, total order broadcast, and different consensus algorithms.
I could go on forever about this book. To name a few other things it touches on to get a good idea of the breadth here: networking (and networking faults), OLAP, OLTP, 2 phase locking, graph databases, 2 phase commit, data encoding, general fault tolerance, compatibility, message passing, everything I mentioned above, and the list goes on and on and on. I recommend anyone who does any kind of work with these systems takes the time to read this book. All 600ish pages are worth reading, and it's presented in an excellent, engaging way with real world practical examples for everything.
- Reviewed in the United States on June 2, 2018Kleppmann mentioned during his "Turning the Database Inside Out with Apache Samza" talk at Strange Loop 2014 (see my notes) that he was on sabbatical working on this book, and while waiting quite some time for it to be published, I ended up experimenting with his Bottled Water project as well as Apache Kafka (which was only at release 0.8.2.2 at that point in time). Other reviewers are correct that much of the material included in this book is available elsewhere, but this book is packaged well (although still at 550-pages and heavyweight), with most of the key topics associated with data-intensive applications under one roof with good explanations and numerous footnotes which point to resources providing additional detail.
Content is broken down into 3 sections and 12 chapters: (1) foundations of data systems, which covers reliable, scalable, and maintainable applications, data models and query languages, storage and retrieval, and encoding and evolution, (2) distributed data, which covers replication, partitioning, transactions, the trouble with distributed systems, and consistency and consensus, and (3) derived data, which covers batch processing, stream processing, and the future of data systems. The latter 6 chapters are weighted more heavily, with chapter 9 on consistency and consensus, and chapter 12 on the future of data systems, the most lengthy with each comprising about 12% of the book.
Some potential readers might be disappointed that this book is all theory, but while the author does not provide any code he discusses practical implementation and specific details when applicable for comparisons within a product category. In my opinion, the last chapter is probably the most abstract simply because it explores ideas about how the tools covered in the prior two chapters might be used in the future to build reliable, scalable, and maintainable applications. Similiary, the chapter on the opposite end of this book sets the stage well for any developer of nontrivial applications with its section on thinking about database systems and the concerns around reliability, scalability, and maintainability.
About a year ago, I recall an executive colleague responding to me with a quizzical look when I mentioned that tooling for data and application development is converging over time, and just a few months prior I mentioned in a presentation to developers that transactional and analytical capabilities are being provided more and more by single database products, with one executive in the audience shaking his head in disagreement that kappa rather than lambda architectures are the way to go. Kleppman mentions that we typically think of databases, message brokers, caches, etc as residing in very different categories of tooling because each of these has very different access patterns, meaning different performance characteristics and therefore different implementations.
So why should all of this tooling not be lumped together under an umbrella term such as 'data systems'? Many products for data storage and processing have emerged in recent years, optimized for a variety of use cases and no longer neatly fitting into traditional categories: the boundaries between categories are simply becoming blurred, and since a single tool can no longer satisfy the data processing and storage needs for many applications, work is broken down into tasks that can be performed efficiently on a single system that is often comprised of different tooling stitched together by application code under the covers.
In addition to the author's abundant and effective simple line diagrams that are reminiscent (although more sophisticated) of his earlier diagrams, one aspect that I especially appreciate is the nomenclature comparisons between products when walking through terminology. For example, at the beginning of chapter 6, the author specifically calls out the terminological confusion that exists with respect to partitioning. "What we call a 'partition' here is called a 'shard' in MongoDB, Elasticsearch, and SolrCloud; it's known as a 'region' in HBase, a 'tablet' in Bigtable, a 'vnode' in Cassandra and Riak, and a 'vBucket' in Couchbase. However, partitioning is the most established term, so we'll stick to that."
In addition, Kleppmann walks through differences between products when the same terminology is being used, which can also lead to confusion. For example, in chapter 7 the author provides a great 5-page discussion on the meaning of "ACID" (atomicity, consistency, isolation, and durability), which was an effective reminder to me that while this term was coined in 1983 in an effort to establish precise terminology for fault-tolerance mechanisms in databases, in practice one database's implementation of ACID does not equal another's implementation. "Today, when a system claims to be 'ACID compliant', it's unclear what guarantees you can actually expect. ACID has unfortunately become mostly a marketing term."
If you've ever found yourself confused about the concept of "consistency", the author offers a sanity check that your confusion is warranted, not only because the term is "terribly overloaded" with at least four different meanings, but because "the letter C doesn't really belong in ACID" since it was "tossed in to make the acronym work" in the original paper, and that "it wasn't considered important at the time." The reality is that "atomicity, isolation, and durability are properties of the database, whereas consistency (in the ACID sense) is a property of the application. The application may rely on the database's atomicity and isolation properties in order to achieve consistency, but it's not up to the database alone."
An later in chapter 9 where he discusses consistency and consensus, the author provides a great sidebar on "the unhelpful CAP theorem". As Kleppmann later comments, "the CAP theorem as formally defined is of very narrow scope: it only considers one consistency model (namely linearizability) and one kind of fault (network partitions, or nodes that are alive but disconnected from each other). It doesn't say anything about network delays, dead nodes, or other trade-offs. Thus, although CAP has been historically influential, it has little practical value for designing systems."
The author concludes in a sidebar by commenting that "all in all, there is a lot of misunderstanding and confusion around CAP, and it does not help us understand systems better, so CAP is best avoided." This is because "CAP is sometimes presented as 'Consistency, Availability, Partition tolerance: pick 2 out of 3'. Unfortunately, putting it this way is misleading because network partitions are a kind of fault, so they aren't something about which you have a choice: they will happen whether you like it or not...A better way of phrasing CAP would be 'either Consistent or Available when Partitioned'. A more reliable network needs to make this choice less often, but at some point the choice is inevitable."
While the second section of this text on distributed data was most beneficial to me, the third section on derived data was least beneficial, mainly because I'm already familiar with these topics from recent readings and experience, and because I needed to refamiliarize myself with the content discussed in the second section. However, the author presents derived data well, and I certainly do not recommend skipping this section. As Kleppmann comments, the issues around integrating multiple different data systems into one coherent application architecture is often overlooked by vendors who claim that their product can satisfy all of your needs. In reality, integrating disparate systems (which can be grouped into the two broad categories of "systems of record" and "derived data systems") is one of the most important things that needs to be done in a nontrivial application. I highly recommend this text.
- Reviewed in the United States on January 5, 2025This book provides extensive knowledge about data management and systems. Written in understandable and easy words, the flow of the content is completely straightforward and relevant to each other. The content is managed due to dependencies respectively
Top reviews from other countries
- Giuseppe CalabreseReviewed in Italy on July 21, 2024
5.0 out of 5 stars Comprehensive introduction to the topic
The book includes most of the contemporary techniques to address the topic of distributed data-intensive applications.
I really loved the list of references at the end of every chapter.
It also works as a reference book for continuous use at work.
-
Felix KundeReviewed in Germany on September 6, 2017
5.0 out of 5 stars Grandioser Wegweiser durch die Landschaft verteilter Systeme
Wow. Ist selten, dass mich ein IT-Buch so begeistert zurücklässt. Wer interessiert ist an neuen Big Data Möglichkeiten ist (ohne selbst Big Data zu haben ;), aber sich etwas verloren fühlt im Softwaredschungel der verschiedenen Skalierungskonzepte, bekommt hier einen sehr fundierten Überblick (Landkarten inkl.). Heutzutage wird ja gefühlt wöchentlich über neue Frameworks oder Datenbanken gebloggt. Erst war MapReduce das Ding, dann Batch Processing, dann Stream Processing…
All das findet man auch in diesem Buch. Es hilft also beim Lesen, wenn man selbst schon einen groben Überblick vom Markt hat. Grundlegendes Wissen zu Datenbanken ist, denke ich, ein Muss, da das Buch eher für Backend-Leute geschrieben ist. Aber statt Algorithmen und APIs mit Code-Listings aufzudröseln und eine Anleitung mit Beispieldatensätzen zu geben (was man bei dem Titel des Buches vielleicht vermuten könnte), geht es hier eher um die Ideen hinter den verschiedenen Ansätzen und wie sie alle untereinander bzw. mit bekannten altgedienten Lösungen im Zusammenhang stehen. Einzelne Software-Produkte sind hier nur Randnotizen. Dafür wird jedoch an jedem Ende eines Kapitels eine sagenhafte Liste an weiterführenden Material geboten, was deutlich die Leidenschaft des Autors für das Thema erkennen lässt.
Ich fand es spannend wie der Bogen von einfachen überschaubaren Datenbank-Architekturen (Single-Leader) zu verteilten Systemen geschlagen wird. Man ist überrascht wie viele Integritätsprobleme sich für Daten ergeben können (Zitat Kollege: „Verteilte Systeme sind die Hölle“). Durch die einfach gehaltenen Flussdiagramme sind die lauernden Gefahren immer sofort verständlich. Der Autor verwendet zudem sehr oft Querverweise und wiederholt Erklärungen mehrmals. Liest man große Passagen am Stück, ist es vielleicht etwas zu viel des Guten. Aber es ist gerade dann hilfreich, wenn man das Buch wieder in die Hand nimmt, um weiterzulesen oder man bestimmte Abschnitte nochmal nachschlagen möchte. Letzteres werde ich sicherlich noch viele Male tun.
- Nikola ZifraReviewed in the United Arab Emirates on September 18, 2024
3.0 out of 5 stars Lacks details
This book provides a high level overview but unfortunatly lacks quite a bit of detail
- Joachim O.Reviewed in the United Kingdom on November 17, 2024
5.0 out of 5 stars Great in-depth analysis of data architectures
This book covers pretty much all topics which are relevant to managing databases or designing data models in more than 800 pages. It also provides detailed information about the inner workings of databases to the degree that you might be able to implement your own simple database.
The book is very well didactically structured which is no surprise given that the author is a professor at Cambridge. For example, it explains batch processing algorithms (e.g. Map Reduce) and uses this as basis to delve into data streaming. Strong emphasis is laid on the problems with regards to distributed computing (replication, partitioning, node failures, etc.) and the discussion of the compromises one must make.
Overall, an easy recommendation for anyone is interested in data architectures and the inner workings of databases which are the backbone of pretty much any application in today’s world.
- Mishan JanithaReviewed in Japan on April 12, 2024
5.0 out of 5 stars About Book
Recommend book for software Engineers