{"id":11868,"date":"2021-09-29T11:00:06","date_gmt":"2021-09-29T18:00:06","guid":{"rendered":"https:\/\/softwareengineeringdaily.com\/?p=11868"},"modified":"2021-09-28T14:51:23","modified_gmt":"2021-09-28T21:51:23","slug":"9-fake-data-anti-patterns-and-how-to-avoid-them","status":"publish","type":"post","link":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/","title":{"rendered":"9 Fake Data Anti-patterns and How to Avoid Them"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">There are wrong ways to fake your data. Whether you\u2019re bootstrapping a dev environment, automating integration tests, or capacity testing in staging, we all need high-quality fake data. Regardless of your use-case, common data generation pitfalls can break your testing or, worse, leak sensitive data into unsecured environments.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are ways to generate synthetic test data that achieve the realism required for effective testing along with the security needed to protect your company and customers. The secret lies in identifying potential anti-patterns and then preventing them from ever forming in the first place.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the context of test data generation, we see anti-patterns emerging in one of three key ways.<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The process fails to mimic the complexity and requirements of real-world situations.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The process fails to protect the privacy of the individuals behind the numbers.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The process fails to work effectively for all data types and sources in the data ecosystem being mimicked.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">The outcome of these failures? Broken data, worthless tests, bugs in production, and in the worst cases, a data security crisis.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A strong data generation infrastructure should have built-in tools to enable its users to generate the patterns they need as opposed to random data. Here, we\u2019ll explore the bad (anti-patterns) to understand what it takes to enable the good (patterns).<\/span><\/p>\n<h3><strong>1. A series of impossible events<\/strong><\/h3>\n<h3><b>Solution: Defined time series rules<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">From healthcare records to financial transactions, to student progress reports, data across industries and platforms is rich with event pipelines. Events can trigger actions in your product and reveal the success of your user journey. They\u2019re a fundamental part of the user experience and the data that experience creates. For accurate testing, they need to be realistic.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Event pipelines generated at random inevitably create impossible time series. A quality solution allows you to define the relationships between events in your data by linking related fields and dictating the order in which they occur. For the highest degree of accuracy, an event generator should also be designed to mirror the distribution of the dates in your original dataset. It\u2019s a combination of complex algorithms on the back-end and customizable rules on the front-end.<\/span><\/p>\n<h3><strong>2. Random categorical shuffling<\/strong><\/h3>\n<h3><b>Solution: Shuffling with defined ratios<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">A frequently used way to obfuscate real data is by shuffling categorical data, for example, the job titles of employees within an organization. The risk in shuffling this data is that it can wipe out the integrity of the data if the ratios, and their relationships to other fields within your dataset, aren\u2019t preserved. For example, imagine you\u2019re generating a synthetic workforce. Random generation might come up with 20 assistants for a single manager or 20 managers with a single assistant.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The ratios and relationships between categories make all the difference in whether the data you generate will be able to simulate real-world situations. A well-designed algorithm for categorical shuffling must take this into account to generate distributions of categorical data that mirrors the reality in your original data.<\/span><\/p>\n<h3><strong>3. Unmapped relationships<\/strong><\/h3>\n<h3><b>Solution: Column linking<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">The vast majority of data involves logical relationships that any human would immediately recognize, but random generators will not draw these relationships unless they are given rules to do so. When the underlying data does not reflect real-world relationships, your testing cannot reflect real-world usage. Not linking columns with a defined relationship during generation can lead to the formation of anti-patterns where you least expect them, both in your testing and in your product.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The tool for avoiding this hazard is the capability of linking as many columns as you need to ensure that dependencies are captured and the stories in your data ring true. So, for example, in a table of payroll data, bonuses become a function of salaries which are tied to job titles partitioned by office location.<\/span><\/p>\n<h3><strong>4. Inconsistent transformations<\/strong><\/h3>\n<h3><b>Solution: Input-to-output consistency<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Even when anonymizing data, it\u2019s often important to anonymize certain values in the same way throughout your dataset. Performing inconsistent data transformations can easily break your data to the point that it\u2019s no longer usable. De-identifying data consistently is the pattern you need. It means that the same input will always map to the same output, throughout your database, allowing you to preserve the cardinality of a column, match duplicate data across databases, or fully anonymize a field and still use it in a join.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Perhaps you have a user database that contains a username in both a column and a JSON blob as well as another database that contains their website activity. Consistency enables you to safely anonymize the username, but still, have that identifier be the same in all locations.\u00a0<\/span><\/p>\n<h3><strong>5. Sensitive data leakage<\/strong><\/h3>\n<h3><b>Solution: Identify and flag PII\/PHI<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">When you\u2019re dealing with personally identifiable information (PII) or protected health information (PHI), your company has a legal obligation to maintain data privacy. The first step in de-identifying PII is identifying columns containing sensitive information and flagging them as needing protection throughout your database. An algorithm can do this quickly and at scale, but it must be carefully built. Imagine a column of birthdates named student_BD instead of birthdate or DOB. A de-identification system that only relies on column names to find PII may not flag that column as sensitive, and a data privacy anti-pattern is born.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">An effective de-identification system uses machine learning to examine both column names and the data within those columns to determine what may or may not be PII. And once the PII is identified, it must be flagged by the system in a way that ensures it will be protected without slipping through.<\/span><\/p>\n<h3><strong>6. Unaccounted-for schema changes<\/strong><\/h3>\n<h3><b>Solution: Flagging schema changes and refreshing test data on demand<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Schema changes are the only constant in modern data ecosystems. Failing to account for these changes, even seemingly minor ones, can lead to failures in your automated testing and, equally as important, risky data leaks. In the best-case scenario, your test data may simply no longer work. In the worst, you\u2019ve now got sensitive data in your lower environments.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The pattern you need here is a tool built into your data generation pipeline that alerts you to any schema changes as they come through. Better yet, it should require you to update your generation model before pulling new data into staging. An ideal system will also allow you to refresh your data on demand, multiple times a day, so your data truly represents a mirror production at all times, schema included.<\/span><\/p>\n<h3><strong>7. Outliers revealing TMI<\/strong><\/h3>\n<h3><b>Solution: Adding noise with differential privacy<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">All data has its outliers. The more precise your data anonymization methods are, the more likely they are to pull those outliers through\u2014outliers that could be used to re-identify individuals if the anonymized data is combined with other available resources. When outliers aren\u2019t taken into consideration, they serve as bold clues to revealing what synthetic data is designed to protect.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The solution here is differential privacy, which adds noise to the data to create a more tempered pattern that obscures outliers. Differential privacy is a property that can be applied to data generation algorithms to <\/span><span style=\"font-weight: 400;\">guarantee a higher level of privacy in your output data. The more algorithms within your data generation process that can be made differentially private, the safer your outliers will be.\u00a0<\/span><\/p>\n<h3><strong>8. Insufficient integration\u00a0<\/strong><\/h3>\n<h3><b>Solution: Cross-functional APIs and seamless integration into CI\/CD pipelines<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Given the nature of tech stacks today, integration should be a key feature of any system you put in place. Whether a data automation tool has an API shouldn\u2019t even be a question you have to ask.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When it comes to building a data generation tool in-house, the process almost always involves writing scripts\u2014scripts that are consistently prone to failure. Building a solution in-house isn\u2019t just a matter of the initial lift; it also requires continuous maintenance to keep the system up and running.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Your data de-identification infrastructure should enable developers to move faster, not weigh them down with double work. A data mimicking tool that has an API, can connect to any data source, integrates seamlessly into your existing systems, and works with your data no matter how it changes over time equips your developers to do their best work. As your data needs evolve, so should the systems that support them.<\/span><\/p>\n<h3><strong>9. Vendor lock-in<\/strong><\/h3>\n<h3><b>Solution: Support for all data sources<\/b><\/h3>\n<p><span style=\"font-weight: 400;\">Relational databases still dominate the world of big data, but NoSQL databases like MongoDB and Cassandra are gaining fast. Even PostgreSQL can easily work with NoSQL code and store JSON files. The future is hybrid. Your data de-identification infrastructure needs to support that.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When it comes to building this infrastructure in-house, you may find yourself dedicating significant resources to creating a process that works with PostgreSQL, only to end up back at square one when your company adds Redshift to the stack. And if you\u2019re using Mongo, you\u2019re going to require an entirely different approach. What\u2019s more, your data may live in separate database types, but that doesn\u2019t mean it isn\u2019t interrelated. Not only will your solution have to work for multiple databases, but it also has to work <\/span><i><span style=\"font-weight: 400;\">between<\/span><\/i><span style=\"font-weight: 400;\"> them as well.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Given today\u2019s ever-expanding data ecosystems, it simply doesn\u2019t make sense to build a system that only works with one database type. Your data generation solution should work as seamlessly with Postgres as it does with Redshift, Databricks, DB2, and MongoDB. Anything less is just a roadblock to data management. Seek out a tool that will work with your data wherever you keep it, now or in the future.<\/span><\/p>\n<h2><strong>Top Takeaways<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Building a high-quality data mimicking and de-identification solution that satisfies all of the above is a major investment of resources, and with both data privacy and data utility on the line, the stakes are incredibly high. The ultimate anti-pattern may very well be burdening your team with all of these requirements in-house or settling for a solution that fails to deliver in any of these areas. The ultimate pattern? Use the above nine sections as a guide to building your own checklist, then seek out a proven platform that is ready to equip your team with all your fake data needs.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are wrong ways to fake your data. Whether you\u2019re bootstrapping a dev environment, automating integration tests, or capacity testing in staging, we all need high-quality fake data. Regardless of your use-case, common data generation pitfalls can break your testing or, worse, leak sensitive data into unsecured environments. There are ways to generate synthetic test<\/p>\n","protected":false},"author":79,"featured_media":11872,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"9 Fake Data Anti-patterns and How to Avoid Them by Chiara Colombi @tonicfakedata","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1363,83,2143],"tags":[1124,2967,5064,5065,16,502,121,5066],"class_list":["post-11868","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-all-episodes","category-articles","category-exclusive-content","tag-api","tag-cd-pipelines","tag-chiara-colombi","tag-fake-data","tag-json","tag-postgres","tag-sql","tag-tonic-ai"],"jetpack_publicize_connections":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily\" \/>\n<meta property=\"og:description\" content=\"There are wrong ways to fake your data. Whether you\u2019re bootstrapping a dev environment, automating integration tests, or capacity testing in staging, we all need high-quality fake data. Regardless of your use-case, common data generation pitfalls can break your testing or, worse, leak sensitive data into unsecured environments. There are ways to generate synthetic test\" \/>\n<meta property=\"og:url\" content=\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\" \/>\n<meta property=\"og:site_name\" content=\"Software Engineering Daily\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-29T18:00:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-28T21:51:23+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Chiara Colombi\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@software_daily\" \/>\n<meta name=\"twitter:site\" content=\"@software_daily\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Chiara Colombi\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\"},\"author\":{\"name\":\"Chiara Colombi\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/43d984101022ed8ae5392854a0848d13\"},\"headline\":\"9 Fake Data Anti-patterns and How to Avoid Them\",\"datePublished\":\"2021-09-29T18:00:06+00:00\",\"dateModified\":\"2021-09-28T21:51:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\"},\"wordCount\":1707,\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1\",\"keywords\":[\"API\",\"cd pipelines\",\"Chiara Colombi\",\"fake data\",\"json\",\"Postgres\",\"SQL\",\"Tonic.ai\"],\"articleSection\":[\"All Content\",\"Exclusive Articles\",\"Exclusive Content\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\",\"url\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\",\"name\":\"9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1\",\"datePublished\":\"2021-09-29T18:00:06+00:00\",\"dateModified\":\"2021-09-28T21:51:23+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1\",\"width\":1200,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/softwareengineeringdaily.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"9 Fake Data Anti-patterns and How to Avoid Them\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"name\":\"Software Engineering Daily\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\",\"name\":\"Software Engineering Daily\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"width\":549,\"height\":169,\"caption\":\"Software Engineering Daily\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/software_daily\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/43d984101022ed8ae5392854a0848d13\",\"name\":\"Chiara Colombi\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/659dcbadfea6cdc01349c07de25baa12?s=96&d=retro&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/659dcbadfea6cdc01349c07de25baa12?s=96&d=retro&r=pg\",\"caption\":\"Chiara Colombi\"},\"url\":\"https:\/\/softwareengineeringdaily.com\/author\/chiara\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/","og_locale":"en_US","og_type":"article","og_title":"9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily","og_description":"There are wrong ways to fake your data. Whether you\u2019re bootstrapping a dev environment, automating integration tests, or capacity testing in staging, we all need high-quality fake data. Regardless of your use-case, common data generation pitfalls can break your testing or, worse, leak sensitive data into unsecured environments. There are ways to generate synthetic test","og_url":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/","og_site_name":"Software Engineering Daily","article_published_time":"2021-09-29T18:00:06+00:00","article_modified_time":"2021-09-28T21:51:23+00:00","og_image":[{"width":1200,"height":720,"url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","type":"image\/png"}],"author":"Chiara Colombi","twitter_card":"summary_large_image","twitter_creator":"@software_daily","twitter_site":"@software_daily","twitter_misc":{"Written by":"Chiara Colombi","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#article","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/"},"author":{"name":"Chiara Colombi","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/43d984101022ed8ae5392854a0848d13"},"headline":"9 Fake Data Anti-patterns and How to Avoid Them","datePublished":"2021-09-29T18:00:06+00:00","dateModified":"2021-09-28T21:51:23+00:00","mainEntityOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/"},"wordCount":1707,"publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","keywords":["API","cd pipelines","Chiara Colombi","fake data","json","Postgres","SQL","Tonic.ai"],"articleSection":["All Content","Exclusive Articles","Exclusive Content"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/","url":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/","name":"9 Fake Data Anti-patterns and How to Avoid Them - Software Engineering Daily","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","datePublished":"2021-09-29T18:00:06+00:00","dateModified":"2021-09-28T21:51:23+00:00","breadcrumb":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#primaryimage","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","width":1200,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/softwareengineeringdaily.com\/2021\/09\/29\/9-fake-data-anti-patterns-and-how-to-avoid-them\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/softwareengineeringdaily.com\/"},{"@type":"ListItem","position":2,"name":"9 Fake Data Anti-patterns and How to Avoid Them"}]},{"@type":"WebSite","@id":"https:\/\/softwareengineeringdaily.com\/#website","url":"https:\/\/softwareengineeringdaily.com\/","name":"Software Engineering Daily","description":"","publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/softwareengineeringdaily.com\/#organization","name":"Software Engineering Daily","url":"https:\/\/softwareengineeringdaily.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","width":549,"height":169,"caption":"Software Engineering Daily"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/software_daily"]},{"@type":"Person","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/43d984101022ed8ae5392854a0848d13","name":"Chiara Colombi","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/659dcbadfea6cdc01349c07de25baa12?s=96&d=retro&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/659dcbadfea6cdc01349c07de25baa12?s=96&d=retro&r=pg","caption":"Chiara Colombi"},"url":"https:\/\/softwareengineeringdaily.com\/author\/chiara\/"}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/09\/Tonic_Logo-07_for_greenhouse.png?fit=1200%2C720&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p7GuoD-35q","_links":{"self":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/11868"}],"collection":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/users\/79"}],"replies":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/comments?post=11868"}],"version-history":[{"count":0,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/11868\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media\/11872"}],"wp:attachment":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media?parent=11868"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/categories?post=11868"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/tags?post=11868"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}