Inline Embed
 
Popover Embed
 
Email Campaign Link
 
Share

Embed Code

Options

Video Size


  
Video will resize to fit its container

Embed Type

Our recommended JavaScript embed code

A resilient embed based on an iframe

oEmbed

      Using WordPress? Check out our help page.

SEO Metadata

This is a preview of the SEO metadata that this video injects into any page on which it's embedded. Learn more about video SEO.

Name MPP Data Virtualization
Description Ben Szekely, Senior Vice President, Head of Field Operations in Field Operations, discusses how semantic metadata technology, powered by MPP capabilities, provides users with complete control over how they elevate their Data Fabric for discovery and integration. Then, he offers a sneak-peek into new functionality coming to Anzo later in 2020.
Thumbnail URL https://embed-ssl.wistia.com/deliveries/b006ef1724ff614a6...
Embed URL https://fast.wistia.net/embed/iframe/l2kvdiv2xy
Duration PT378S
Upload Date 2020-04-08T03:47:49+00:00
Transcript
Hi everyone I'm Ben Szekely, Senior Vice President and Head of Field Operations, here at Cambridge Semantics and today I'm going to give you a quick walkthrough of a new capability coming soon to the Anzo platform. Massively parallel processing data virtualization is a true breakthrough for data integration within the context of the enterprise data fabric. The enterprise data fabric is an architecture for modern data management that anticipates the need to connect data across the business. It's really an overlay that sits on top of your existing data warehouses, data lakes, cloud data repositories, document repositories, relational database systems and allows users to quickly blend and combine data from different data sources using business models that then can be consumed in any sort of application, BI or analytics tool or exploratory analytics directly in the underlying graph model itself. Now this type of data integration has to be really fast and agile and flexible and work at real enterprise scale and so traditional approaches that move all the data into one place like a like a data warehouse just can't scale or keep up with this type of approach and so something new is required. Let's take a look. So pure federation-based approaches are have been looked at as a fairly seductive way to try to solve this problem. You simply put an overlay of federated query engine on top of all your data sources, you run a SQL query and it magically queries all the underlying sources and bring your brings your data back. But this approach has had problems at enterprise scale that required things like SQL caches to maintain for practical performance but this ultimately narrows down the use cases so that you can solve when you're maintaining that cache it's also you know a single-threaded OLTP architecture so each query can have very long runtimes and it relies heavily on these SQL caches and ultimately it can be very taxing on the source systems if not done properly. And at Cambridge Semantics we looked at these approaches a long time ago and made a decision early on this wasn't going to be suitable for doing data integration at data fabric scale. So up through the current version of Anzio 5.0 we've taken a very practical approach for data integration at the data fabric scale that makes use of our massively parallel graph engine AnzoGraph. What we do is we use SPARK jobs to rapidly pre-position data in a lightweight fashion from data sources. We use metadata and intelligence and understanding the data sources to automatically onboard data into this cheap compressed graph storage. The graph storage is compressed so it doesn't take up a lot of space and the metadata cataloging gives administrators and users a lot of a lot of freedom and flexibility into the lifecycle that so you don't have to move all of your data just move the data that you need. And then we have our MPP engine that can really quickly load the data from that graph engine up into memory and allow users to do MPP query really really fast off data that's been loaded from that pre-position storage. And so the net benefit of this is that the users get the data they need when they need it with some very optimal pre-positioning of that data and the admins have full control over the life cycle. And this has worked incredibly well for our existing customers at Cambridge Semantics. But as we really look to deliver data at true enterprise scale provide that overlay across all of the data and the business this pre-positioning will work but it can't keep up with what's required. Our engineering team has often wondered can we apply the massively parallel AnzoGraph database to do data virtualization directly and that's exactly what we've done. Coming later this year is our breakthrough MPP data virtualization capability that I'm now going to tell you about. So first off the AnzoGraph engine is now capable of loading data into memory directly from any source system all in parallel. So right within your data loading queries you can define connections to databases, you could apply lightweight mappings, you can pull from API's, you can from JSON at XML and CSV formats all in memory, all in parallel directly in the graph engine. And what we've observed is that you can load data really really fast from these data sources, in some cases almost as fast as you can load it from pre-positioned local storage and that's going to make a huge difference to customers that want to rapidly onboard data into the graph engine for use in data integration and all kinds of queries. In addition to loading the data directly in the memory we can also do pushdown query planning that minimizes data movement so when a user issues a query against AnzoGraph it can then in turn apply views in the define query to actually query that data directly in the databases in real time. So the benefit of all this is that users benefit from the MPP in-memory query capabilities of Anzo and AnzoGraph without having to pre-position the data on disk first. And so this leads to much faster cycle times, faster deployment, more flexibility, all the things the data fabric can really provide are now available even faster. So to summarize, with these new capabilities Anzo can actually support a hybrid structure. The reality is that some use cases you want to pre-position data. Some use cases you want to load the data into memory before querying it. Other times you'll want to query it into memory but then have that go directly against the data source at query time. With Anzo's graphmart capability you'll be able to in a single graphmart actually apply all three of these capabilities to the same use case. You can have some data and that's been pre-positioned that's loaded in the memory, you'll have some data that you load directly into memory from your sources and other data that you're querying in real-time. And so depending on your use case you can select the approach that works best for the data and the way it's queried for the ultimate flexibility in data virtualization within the data fabric. So for more information please visit cambridgesemantics.com and have a look at some of our blogs and white papers that are talking about these exciting new features. Thank you.
SEO embeds are not compatible with Turnstile at the start of your video. To use an SEO embed, please turn off Turnstile, or set it at the end of your video.

Embed Code

Options

Display link as


Preview

Email Merge Tag

Options

Email provider

Thumbnail size

Links to

 

When your recipient receives the email, it will have a thumbnail image that acts as a link to your video.

If you have a landing page where the video is embedded, update this field to point to that landing page.

Autoplay

Preview

MPP Data Virtualization

Public Link for Sharing

Link viewers directly to the video within your Wistia account, or quickly post the link to a social media platform. Different social sites interpret Wistia links differently, so check out the social sharing page for more.
Public share link
This is a direct link to your video's media page (within your Wistia account). Viewers who use this link will only be allowed access to the single video--they won't be able to navigate to other videos in the project.
Facebook
Wistia videos shared on Facebook will display a thumbnail and a link to the Wistia media page. This allows you to use all of our awesome customization features and still track views.
Twitter
Videos (with all of your customizations) embed directly into your tweet, and will play inline in the Twitter timeline.
Share public link to social media:  
Export to VideoAsk
new

VideoAsk allows you to have asynchronous video conversations with your customers. Learn more here!

To share this video publicly, its project must be unlocked. An unlocked project can be accessed via URL, without the viewer needing to log in. Learn more about unlocked projects.