{"id":8813,"date":"2020-02-18T08:00:10","date_gmt":"2020-02-18T16:00:10","guid":{"rendered":"http:\/\/softwareengineeringdaily.com\/?p=8813"},"modified":"2020-11-10T14:00:37","modified_gmt":"2020-11-10T22:00:37","slug":"linkedin-data-infrastructure","status":"publish","type":"post","link":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/","title":{"rendered":"LinkedIn Data Infrastructure"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">LinkedIn has become a staple for the modern professional, whether it\u2019s used for searching for a new job, reading industry news, or keeping up with professional connections.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As a rapidly growing platform that serves <\/span><a href=\"https:\/\/about.linkedin.com\/\"><span style=\"font-weight: 400;\">more than 675 million users<\/span><\/a><span style=\"font-weight: 400;\"> today, LinkedIn is a company that can boast of having one of the largest user bases in the world. How these users interact with the site and react to recommendations aggregates into a massive dataset. On a scale that not many companies experience, LinkedIn has a large amount of data that brings interesting engineering problems and opens up ripe opportunity for innovation in areas like data infrastructure and tooling.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Even though LinkedIn is a 16-year-old company, its data infrastructure journey is far from over. LinkedIn\u2019s infrastructure quest covers a wide range of practices, having <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2015\/12\/data-center-learnings--what-others-can-learn-from-our\"><span style=\"font-weight: 400;\">approximately 20 servers in a small data center in 2008<\/span><\/a><span style=\"font-weight: 400;\"> to building smarter data centers around the world, and more recently, as of July 2019, having begun a multi-year migration to the public cloud with Azure. Throughout this journey, LinkedIn engineers have faced a variety of challenges and documented their solutions as lessons to be learned along the way, as well as built and open-sourced invaluable tools like Kafka and Voldemort, used by millions of other engineers.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In the early years of LinkedIn, the data infrastructure relied on a single data center, hosted with a retail data center provider. In those days, the priority, with data being served from a single data center, was availability &#8211; <\/span><a href=\"https:\/\/www.linkedin.com\/pulse\/site-up-benjamin-purgason\/\"><span style=\"font-weight: 400;\">keeping the site up for users<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As the number of users grew and new features were released, adding data center capacity through a retail provider became less cost-effective. This is when LinkedIn started its own data center, gradually fanning out to rely on multiple data centers. By not only expanding their data centers in number, but also <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2016\/03\/project-altair--the-evolution-of-linkedins-data-center-network\"><span style=\"font-weight: 400;\">designing the fabric of the data centers in a smart way<\/span><\/a><span style=\"font-weight: 400;\">, LinkedIn grew into its modern infrastructure, able to handle millions of users.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn has showcased a multi-perspective strategy on handling growth. The most prominent strategies have been expanding the number and the capacity of data centers, building smarter data centers, and creating tooling around massive data to enable faster integration of data into workflows to propel innovation.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2 style=\"text-align: center;\"><b>Data Sources at LinkedIn<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">LinkedIn has a couple of main sources of data, as Kapil Surlaker explains in our <\/span><a href=\"https:\/\/softwareengineeringdaily.com\/2019\/11\/07\/linkedin-data-engineering-with-kapil-surlaker\/\"><span style=\"font-weight: 400;\">episode on the company\u2019s data infrastructure<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The first source is transactional data from the users: every action taken by a user in the form of status updates to post \u201clikes\u201d and job views must be stored. The second source is telemetry data, which comes from monitoring applications to gain insight into how the different components of the platform are performing. The third source, one without an upper bound according to Surlaker, is derived data, generated by developers for numerous purposes such as data sets to be used for analysis and building machine learning models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These types of data are common for web applications with user interactions. Things get complicated when data has to be consolidated in a standard format to enable a unified experience for the developers in a company.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The data sources can be widely different &#8211; historical data usually comes from RDBMSs designed for OLAP, current transactional data comes from NoSQL databases and streams, and logs can be delivered in a variety of formats. In which paradigm the data comes in is also important: ingesting streaming data and using batch data may have different requirements.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn\u2019s main answer to handling these diverse sets of data has been through tooling. Luckily for the general developer community, many of these tools have been open-sourced over the years.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2 style=\"text-align: center;\"><b>Open Source Tools<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">One of LinkedIn\u2019s strategies for dealing with the massive amounts of data that are being constantly generated is to empower engineers by developing tools to deal with different aspects of the data, from ingestion to storage.\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?ssl=1\"><img fetchpriority=\"high\" decoding=\"async\" data-attachment-id=\"8816\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image3-24\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?fit=1353%2C1305&amp;ssl=1\" data-orig-size=\"1353,1305\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image3\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?fit=300%2C289&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?fit=1024%2C988&amp;ssl=1\" class=\"aligncenter wp-image-8816 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?resize=509%2C491&#038;ssl=1\" alt=\"\" width=\"509\" height=\"491\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?w=1353&amp;ssl=1 1353w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?resize=300%2C289&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?resize=1024%2C988&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image3-2.png?resize=768%2C741&amp;ssl=1 768w\" sizes=\"(max-width: 509px) 100vw, 509px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn has <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/topic\/open-source\"><span style=\"font-weight: 400;\">built and open-sourced a variety of tools over the years<\/span><\/a><span style=\"font-weight: 400;\">. One of these tools, <\/span><a href=\"https:\/\/kafka.apache.org\/\"><span style=\"font-weight: 400;\">Kafka<\/span><\/a><span style=\"font-weight: 400;\">, built by LinkedIn and donated to Apache Software Foundation, forms the backbone of data operations at LinkedIn alongside Hadoop. Kafka, a distributed streaming platform, acts as a low-latency data collection system for the real-time data generated by LinkedIn\u2019s user base.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Complementing Kafka is a tool called <\/span><a href=\"https:\/\/gobblin.apache.org\/\"><span style=\"font-weight: 400;\">Gobblin<\/span><\/a><span style=\"font-weight: 400;\">, a distributed data integration framework. Gobblin is used to <\/span><a href=\"https:\/\/www.slideshare.net\/ShirshankaDas\/apache-gobblin-bridging-batch-and-streaming-data-integration-big-data-meetup-2017\"><span style=\"font-weight: 400;\">ease and unify the integration of data<\/span><\/a><span style=\"font-weight: 400;\"> between different sources and sinks, providing scalability, fault tolerance, and quality assurance in one tool. <\/span><a href=\"https:\/\/engineering.linkedin.com\/data-ingestion\/gobblin-big-data-ease\"><span style=\"font-weight: 400;\">Developed initially to serve as an \u201cuber-ingestion framework\u201d for Hadoop at LinkedIn<\/span><\/a><span style=\"font-weight: 400;\">, Gobblin was open-sourced and donated to Apache where it has taken on new integrations and a diverse community of committers.<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?ssl=1\"><img decoding=\"async\" data-attachment-id=\"8818\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image5-18\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?fit=638%2C359&amp;ssl=1\" data-orig-size=\"638,359\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image5\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?fit=300%2C169&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?fit=638%2C359&amp;ssl=1\" class=\"aligncenter wp-image-8818 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?resize=505%2C284&#038;ssl=1\" alt=\"\" width=\"505\" height=\"284\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?w=638&amp;ssl=1 638w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image5-3.png?resize=269%2C151&amp;ssl=1 269w\" sizes=\"(max-width: 505px) 100vw, 505px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<h2 style=\"text-align: center;\"><b>Project InVersion<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">In the fast-moving world of startups, technical debt is often overlooked. It refers to an <\/span><a href=\"https:\/\/martinfowler.com\/bliki\/TechnicalDebt.html\"><span style=\"font-weight: 400;\">accumulation of deficiencies that make it harder to add new features to the system<\/span><\/a><span style=\"font-weight: 400;\">. The most common way of accumulating technical debt is by releasing features quickly without thinking of the future sustainability of the overall system, a practice that is prominent for startups that are looking to attract users and investors with shiny new features.<\/span><\/p>\n<p><a href=\"https:\/\/martinfowler.com\/bliki\/TechnicalDebtQuadrant.html\"><span style=\"font-weight: 400;\">Technical debt occurs in many ways<\/span><\/a><span style=\"font-weight: 400;\">, and it\u2019s not always easy to prevent it. Developers at LinkedIn faced their technical debt in a hard way.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In 2011, after the company\u2019s initial public offering, LinkedIn\u2019s technical debt hit a critical point. Practices in the infrastructure that had been in use for years and problems that were compounded as new features were added on top of them could not be held down anymore. LinkedIn went for a risky infrastructure overhaul, now referred to as Project InVersion.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For two months in 2011, <\/span><a href=\"https:\/\/www.linkedin.com\/pulse\/when-your-tech-debt-comes-due-kevin-scott\/\"><span style=\"font-weight: 400;\">LinkedIn stopped rolling out new features<\/span><\/a><span style=\"font-weight: 400;\"> as developers focused on improving and modernizing their infrastructure &#8211; a full team effort to get rid of the technical debt of the last eight years. This overhaul included developing new tools that automated testing, accelerated the process of rolling out features and updating the platform, and in the end, completely transformed LinkedIn\u2019s backbone.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2 style=\"text-align: center;\"><b>Challenges with ML<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">LinkedIn offers a personalized experience to each of its users. The way that <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2018\/03\/a-look-behind-the-ai-that-powers-linkedins-feed--sifting-through\"><span style=\"font-weight: 400;\">posts in their feed are sorted<\/span><\/a><span style=\"font-weight: 400;\">, the job recommendations they see, and other recommendations need to be specific for everyone on the platform. The main power behind these operations are machine learning models.<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?ssl=1\"><img decoding=\"async\" data-attachment-id=\"8814\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image1-29\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?fit=1999%2C977&amp;ssl=1\" data-orig-size=\"1999,977\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image1\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?fit=300%2C147&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?fit=1024%2C500&amp;ssl=1\" class=\"aligncenter wp-image-8814 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?resize=512%2C250&#038;ssl=1\" alt=\"\" width=\"512\" height=\"250\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?w=1999&amp;ssl=1 1999w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?resize=300%2C147&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?resize=1024%2C500&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?resize=768%2C375&amp;ssl=1 768w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image1-2.png?resize=1536%2C751&amp;ssl=1 1536w\" sizes=\"(max-width: 512px) 100vw, 512px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><i><span style=\"font-weight: 400;\">An example from recommendations on LinkedIn, powered by AI.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn has many teams for each ML application, from Feeds to Communities. Each of these areas poses unique challenges in defining the right objectives, applying the correct modeling technique, and successfully serving complex models with low latency at scale. Each model must be tightly integrated within the serving stack specific to its problem space. At the same time, there must be a single unified framework that provides a battery of tools to solve the myriad challenges that come with dealing with complex models that operate on a very large set of data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn\u2019s solution is Pro-ML.\u00a0<\/span><\/p>\n<blockquote><p><span style=\"font-weight: 400;\">The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack.<\/span><\/p><\/blockquote>\n<p style=\"text-align: center;\"><a href=\"https:\/\/engineering.linkedin.com\/blog\/2019\/01\/scaling-machine-learning-productivity-at-linkedin\"><span style=\"font-weight: 400;\">Source<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Pro-ML approach divides ML practices into layers as part of the machine learning development lifecycle<\/span><br \/>\n<a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8819\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image6-13\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?fit=1970%2C1106&amp;ssl=1\" data-orig-size=\"1970,1106\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image6\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?fit=300%2C168&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?fit=1024%2C575&amp;ssl=1\" class=\"aligncenter wp-image-8819 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=504%2C283&#038;ssl=1\" alt=\"\" width=\"504\" height=\"283\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?w=1970&amp;ssl=1 1970w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=1024%2C575&amp;ssl=1 1024w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=768%2C431&amp;ssl=1 768w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=1536%2C862&amp;ssl=1 1536w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image6-1.png?resize=269%2C151&amp;ssl=1 269w\" sizes=\"(max-width: 504px) 100vw, 504px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Each of these layers is a step towards building machine learning models for production. LinkedIn finds it helpful to standardize these steps so that engineers across teams can share innovations by simply swapping components with one another. We also provide automation and additional hints to help users find mistakes in their models faster.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In machine learning parlance, a \u201cfeature\u201d is a piece of the data that the model uses to make a prediction. An example might be how many connections in common a user has with someone who posted an item in his or her feed. Features used in various machine learning models are collected into the Feature Marketplace in a searchable format. These features are available when making predictions when the user visits the site, but must be simulated when testing out an idea during model training. LinkedIn has had many challenges in the past with ensuring features are computed the same way during model training and prediction. Pro-ML offers a tool called Frame that unifies feature access and computation in all of the environments.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">LinkedIn also has several open-source tools to integrate machine learning workflows into their infrastructure needs, such as <\/span><a href=\"https:\/\/github.com\/linkedin\/TonY\"><span style=\"font-weight: 400;\">TonY<\/span><\/a><span style=\"font-weight: 400;\"> and <\/span><a href=\"https:\/\/github.com\/linkedin\/photon-ml\"><span style=\"font-weight: 400;\">Photon ML<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><br \/>\n<a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8820\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image7-12\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?fit=700%2C266&amp;ssl=1\" data-orig-size=\"700,266\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image7\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?fit=300%2C114&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?fit=700%2C266&amp;ssl=1\" class=\"aligncenter wp-image-8820 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?resize=503%2C191&#038;ssl=1\" alt=\"\" width=\"503\" height=\"191\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?w=700&amp;ssl=1 700w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image7-2.png?resize=300%2C114&amp;ssl=1 300w\" sizes=\"(max-width: 503px) 100vw, 503px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p><span style=\"font-weight: 400;\">TonY, originally an acronym for TensorFlow on YARN, was developed <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2018\/09\/open-sourcing-tony--native-support-of-tensorflow-on-hadoop\"><span style=\"font-weight: 400;\">out of a need to run distributed deep learning training jobs on large Hadoop clusters<\/span><\/a><span style=\"font-weight: 400;\">. Because other options such as TensorFlow on Spark fell short for LinkedIn\u2019s specific needs, such as lack of GPU scheduling, an internal tool was created and later open-sourced. TonY currently supports not only TensorFlow, but also PyTorch and MXNet.\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/engineering.linkedin.com\/blog\/2016\/06\/open-sourcing-photon-ml\"><span style=\"font-weight: 400;\">Photon ML<\/span><\/a><span style=\"font-weight: 400;\"> was built out of similar needs as a machine learning library on Spark. Rather than deep learning, Photon ML focuses on Generalized Linear Models and Generalized Linear Mixed Models (GLMix). These models built by Photon ML power features where response prediction is useful, namely for recommendation components such as job recommendation, feed ranking, and \u201cPeople You May Know.\u201d<\/span><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?ssl=1\"><img loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8815\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/image2-29\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?fit=782%2C842&amp;ssl=1\" data-orig-size=\"782,842\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?fit=279%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?fit=782%2C842&amp;ssl=1\" class=\"aligncenter wp-image-8815 \" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?resize=505%2C544&#038;ssl=1\" alt=\"\" width=\"505\" height=\"544\" srcset=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?w=782&amp;ssl=1 782w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?resize=279%2C300&amp;ssl=1 279w, https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/image2-2.png?resize=768%2C827&amp;ssl=1 768w\" sizes=\"(max-width: 505px) 100vw, 505px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<h2 style=\"text-align: center;\"><b>Journey to Cloud<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">LinkedIn <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2019\/building-next-infra\"><span style=\"font-weight: 400;\">has been using Azure for some of its operations<\/span><\/a><span style=\"font-weight: 400;\">, such as Microsoft\u2019s <\/span><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/content-moderator\/\"><span style=\"font-weight: 400;\">Content Moderator APIs<\/span><\/a><span style=\"font-weight: 400;\"> as part of Cognitive Service for detecting inappropriate content and <\/span><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/text-analytics\/\"><span style=\"font-weight: 400;\">Text Analytics APIs<\/span><\/a><span style=\"font-weight: 400;\"> for <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2018\/06\/dynamic-machine-translation-in-the-linkedin-feed-\"><span style=\"font-weight: 400;\">machine translation<\/span><\/a><span style=\"font-weight: 400;\">. The choice to use Azure services from Cognitive Service is an important point: LinkedIn has proven over the years through numerous projects built and open-sourced by its engineers that the company is not averse to tackling a problem from the root and developing the necessary solution. There is a trade-off, in terms of the developer effort put in by engineers in LinkedIn and the cost of using a service from a provider. Beyond this trade-off, however, comes the question of reliability and scale, especially for a company like LinkedIn, unique in the amount of data and number of users its platform serves.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Recently, Senior VP of Engineering of LinkedIn, Mohak Shroff <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\/2019\/building-next-infra\"><span style=\"font-weight: 400;\">announced<\/span><\/a><span style=\"font-weight: 400;\"> that the company will be making the switch to the public cloud under the umbrella of Azure. This is a critical move, and a deliberate one, according to Shroff &#8211; periodically weighing the pros and cons of public cloud from a multi-faceted approach, ranging from applicability to the bare economics, the company recently decided that it would be a worthy next step.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These considerations are significant. The decisions to use Azure services show the company\u2019s trust in Azure to handle some of the data operations on the scale of LinkedIn.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To learn more about what the engineers over at LinkedIn are building to connect the world\u2019s professionals, check out the company\u2019s <\/span><a href=\"https:\/\/engineering.linkedin.com\/blog\"><span style=\"font-weight: 400;\">blog<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>LinkedIn has become a staple for the modern professional, whether it\u2019s used for searching for a new job, reading industry news, or keeping up with professional connections.\u00a0 As a rapidly growing platform that serves more than 675 million users today, LinkedIn is a company that can boast of having one of the largest user bases<\/p>\n","protected":false},"author":15,"featured_media":10159,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1363,83,2143],"tags":[2120,323,3510,2402,3329,336,311,282],"class_list":["post-8813","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-all-episodes","category-articles","category-exclusive-content","tag-data-infrastructure","tag-deep-learning","tag-gobblin","tag-gokhan-simsek","tag-kapil-surlaker","tag-linkedin","tag-machine-learning","tag-open-source"],"jetpack_publicize_connections":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.8 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>LinkedIn Data Infrastructure - Software Engineering Daily<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"LinkedIn Data Infrastructure - Software Engineering Daily\" \/>\n<meta property=\"og:description\" content=\"LinkedIn has become a staple for the modern professional, whether it\u2019s used for searching for a new job, reading industry news, or keeping up with professional connections.\u00a0 As a rapidly growing platform that serves more than 675 million users today, LinkedIn is a company that can boast of having one of the largest user bases\" \/>\n<meta property=\"og:url\" content=\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\" \/>\n<meta property=\"og:site_name\" content=\"Software Engineering Daily\" \/>\n<meta property=\"article:published_time\" content=\"2020-02-18T16:00:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-11-10T22:00:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"2048\" \/>\n\t<meta property=\"og:image:height\" content=\"1024\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Gokhan Simsek\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GokhanSimseek\" \/>\n<meta name=\"twitter:site\" content=\"@software_daily\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gokhan Simsek\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\"},\"author\":{\"name\":\"Gokhan Simsek\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/e890d5bb8941fc76fb69909e6702c34d\"},\"headline\":\"LinkedIn Data Infrastructure\",\"datePublished\":\"2020-02-18T16:00:10+00:00\",\"dateModified\":\"2020-11-10T22:00:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\"},\"wordCount\":1793,\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1\",\"keywords\":[\"data infrastructure\",\"Deep Learning\",\"Gobblin\",\"Gokhan Simsek\",\"Kapil Surlaker\",\"LinkedIn\",\"Machine Learning\",\"Open Source\"],\"articleSection\":[\"All Content\",\"Exclusive Articles\",\"Exclusive Content\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\",\"url\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\",\"name\":\"LinkedIn Data Infrastructure - Software Engineering Daily\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1\",\"datePublished\":\"2020-02-18T16:00:10+00:00\",\"dateModified\":\"2020-11-10T22:00:37+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1\",\"width\":2048,\"height\":1024},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/softwareengineeringdaily.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"LinkedIn Data Infrastructure\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"name\":\"Software Engineering Daily\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\",\"name\":\"Software Engineering Daily\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1\",\"width\":549,\"height\":169,\"caption\":\"Software Engineering Daily\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/software_daily\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/e890d5bb8941fc76fb69909e6702c34d\",\"name\":\"Gokhan Simsek\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9ad291f06753e23a47e536bcfe34701f?s=96&d=retro&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9ad291f06753e23a47e536bcfe34701f?s=96&d=retro&r=pg\",\"caption\":\"Gokhan Simsek\"},\"description\":\"Gokhan is a computer science graduate, currently pursuing a MSc. degree in Data Science at Eindhoven University of Technology.\",\"sameAs\":[\"https:\/\/x.com\/GokhanSimseek\"],\"url\":\"https:\/\/softwareengineeringdaily.com\/author\/gokhan\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"LinkedIn Data Infrastructure - Software Engineering Daily","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/","og_locale":"en_US","og_type":"article","og_title":"LinkedIn Data Infrastructure - Software Engineering Daily","og_description":"LinkedIn has become a staple for the modern professional, whether it\u2019s used for searching for a new job, reading industry news, or keeping up with professional connections.\u00a0 As a rapidly growing platform that serves more than 675 million users today, LinkedIn is a company that can boast of having one of the largest user bases","og_url":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/","og_site_name":"Software Engineering Daily","article_published_time":"2020-02-18T16:00:10+00:00","article_modified_time":"2020-11-10T22:00:37+00:00","og_image":[{"width":2048,"height":1024,"url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","type":"image\/jpeg"}],"author":"Gokhan Simsek","twitter_card":"summary_large_image","twitter_creator":"@GokhanSimseek","twitter_site":"@software_daily","twitter_misc":{"Written by":"Gokhan Simsek","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#article","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/"},"author":{"name":"Gokhan Simsek","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/e890d5bb8941fc76fb69909e6702c34d"},"headline":"LinkedIn Data Infrastructure","datePublished":"2020-02-18T16:00:10+00:00","dateModified":"2020-11-10T22:00:37+00:00","mainEntityOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/"},"wordCount":1793,"publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","keywords":["data infrastructure","Deep Learning","Gobblin","Gokhan Simsek","Kapil Surlaker","LinkedIn","Machine Learning","Open Source"],"articleSection":["All Content","Exclusive Articles","Exclusive Content"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/","url":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/","name":"LinkedIn Data Infrastructure - Software Engineering Daily","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","datePublished":"2020-02-18T16:00:10+00:00","dateModified":"2020-11-10T22:00:37+00:00","breadcrumb":{"@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#primaryimage","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","width":2048,"height":1024},{"@type":"BreadcrumbList","@id":"https:\/\/softwareengineeringdaily.com\/2020\/02\/18\/linkedin-data-infrastructure\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/softwareengineeringdaily.com\/"},{"@type":"ListItem","position":2,"name":"LinkedIn Data Infrastructure"}]},{"@type":"WebSite","@id":"https:\/\/softwareengineeringdaily.com\/#website","url":"https:\/\/softwareengineeringdaily.com\/","name":"Software Engineering Daily","description":"","publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/softwareengineeringdaily.com\/#organization","name":"Software Engineering Daily","url":"https:\/\/softwareengineeringdaily.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2024\/01\/cropped-sed_website_banner.png?fit=549%2C169&ssl=1","width":549,"height":169,"caption":"Software Engineering Daily"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/software_daily"]},{"@type":"Person","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/e890d5bb8941fc76fb69909e6702c34d","name":"Gokhan Simsek","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/9ad291f06753e23a47e536bcfe34701f?s=96&d=retro&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9ad291f06753e23a47e536bcfe34701f?s=96&d=retro&r=pg","caption":"Gokhan Simsek"},"description":"Gokhan is a computer science graduate, currently pursuing a MSc. degree in Data Science at Eindhoven University of Technology.","sameAs":["https:\/\/x.com\/GokhanSimseek"],"url":"https:\/\/softwareengineeringdaily.com\/author\/gokhan\/"}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2020\/02\/LinkedIn.jpg?fit=2048%2C1024&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p7GuoD-2i9","_links":{"self":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/8813"}],"collection":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/comments?post=8813"}],"version-history":[{"count":0,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/8813\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media\/10159"}],"wp:attachment":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media?parent=8813"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/categories?post=8813"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/tags?post=8813"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}