What happened with the V’s?

The Viya infrastructure competes against open source. Even so, it might generate sales by compelling applications built on top of it.

I remember when VA was touted as the “Tableau Ki-ler”. But Tableau folks sleep soundly these nights.  

I remember VS, and VDMML. (But who starts a product name with “VD”? 😂)

These were all well-funded applications -- expected to be great applications -- that could have driven sales of Viya.

What happened?

Comment! It's anonymous!

25 replies (most recent on top)

No.

"Which V* products made $$$?”

All we know is that none of them made enough $$$ to stop SAS’s revenue decline.

Most folks on this thread blame the underlying Viya/CAS platform. Would the V* products have succeeded on a different platform?

@kt+1jmf72xgb

You can use analytics to answer such questions. But here is an analytics company where you can't answer it. Oh, the irony...

@kc+1jmf72xgb

The reality of the situation is that no one, and I mean no one, could ever really answer the "which V* products made $$$" question. Bundling, giving something away for free to attempt to drive adoption or increase sales numbers, with no way to tangibly associate external customer work to a specific "offering" to determine what they're actually using, all obscured any answer to that question.

@jb+1jmf72xgb

How would we know that a Data Analyst wants to see the columns if the SAS culture blocked access to the Data Analysts?

The feature factory blog post was spot on with regards to describing SAS culture. Unless you are moving to a different feature factory, or have an extensive network, the lack of measurement and tangible effects of your efforts is a non-starter for a job search.

"That was the whole purpose of Advanced Analytics, a group you formed! You seemed quite upset at the admittedly incompetent leader who didn't get it. Well, you yelled at him for 5 minutes in the hall - maybe that was just your management style."

We all get angry at times. But rage, and yelling at people publicly may be sign consistent with NPD. You can recover, but it takes a lot of awareness to admit you have a problem and to do the hard work to resolve it.

With modern multi-core machines, there are parallelized optimizations for accessing physical columnar store row-groups from the underlying table when most or all columns are needed. Pretty sure CAS does some of this for its Parquet support. Not sure about the Compute Server.

“They think it is cool and that alone will automatically make it a sales success”

If only we’d used “X” as the prefix. Now THAT would move product! XDDML, etc…

"Which of the V* products made meaningful money?"

Alot of techie talk with zero answers concerning which V* products made $$$.. They think it is cool and that alone will automatically make it a sales success. That wrong mindset has persisted for over a decade and greatly explains why SAS is in a bind.

@js+1jmf72xgb You have me confused with someone else.

I do know a fair bit about software engineering. I remember the paper that @h3+1jmf72xgb referenced. It probably gave me the idea to use columnar storage. It seems to me that the V* products had cases where columnar storage would not have helped, and other cases where it would have helped a lot. But I did not work on those products so others may have better judgement.

 I do not know why SAS needed any “path to microservices and CI/CD”. These were also mainstream ideas at the time. SAS was in fact very late in adopting CI/CD.

All these subjects are hard, while counting features is easy. I think that is why, in my time there, SAS managed products as Feature Factories.

https://cutle.fish/blog/12-signs-youre-working-in-a-feature-factory

- I should not have called it a “mistake”, knowing as little as I do about the V* products.

That hasn't stopped you from opining about software design, management or architecture ad nauseum. What do you think you know a lot about? I'd love to hear anything.

BTW, Visual Studio was SAS's path to microservices and CI/CD. I understand your confusion, though, because unlike HPA/LASR/CAS customers actually liked and bought it.

- I saw no common component that did what I needed, and.nobody offered to generalize my component for reuse.

That was the whole purpose of Advanced Analytics, a group you formed! You seemed quite upset at the admittedly incompetent leader who didn't get it. Well, you yelled at him for 5 minutes in the hall - maybe that was just your management style.

- Management did not prioritize architecture, and it doesn’t happen by accident.

You were in charge ... ?

Admit it, your idea of architecture was shoehorning CAS into SAS's core so you could justify the billions spent on HPA.

@h3+1jmf72xgb

I should not have called it a “mistake”, knowing as little as I do about the V* products. Sometimes a data analyst wants to see all the columns. Perhaps that was their reason.

Where columnar storage helps, though, it helps a lot, and as your reference shows it was a known idea at the time.

“a high level of productivity and organizational cohesion.”

I saw no common component that did what I needed, and.nobody offered to generalize my component for reuse. This was typical all over SAS. Management did not prioritize architecture, and it doesn’t happen by accident.

Architecture is hard. Counting features is easy. SAS was a Feature Factory.

@gz+1jmf72xgb

Exactly! The famous Google Dremel paper (that Apache Parquet is largely based on):

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf

came out in 2010

https://research.google/pubs/dremel-interactive-analysis-of-web-scale-datasets-2/

However, the history of columnar databases goes back much further

AI Search Results

Columnar databases originated in the late 1970s and early 1980s. They became more widely used in the 21st century with the rise of big data.
Pioneers
MonetDB: An open-source software that focuses on data mining
Vertica: A commercial database for data warehousing
Google's Bigtable: A key pioneer in the field

Kudos to you for being innovative and industrious enough to build a column store for your project at SAS. With all due respect though, SAS platform R&D should have been doing the necessary research and providing a common column store API, a native storage format (because SAS seems to always have to have their own) and ability to adapt to others like the original MVA engine architecture and CASLIBs provide. Doing so would allow any solution or product to have the capability without having to build something custom and therefore likely more limited.

SAS has a history of doing “almost the same thing” several different ways. I don’t think it’s an exaggeration to say that such redundancy cost hundreds of millions of dollars over the past 30 years. Sadly, most software companies of any size suffer from similar woes. There is seemingly always enough time to create yet another variant, but never enough time elegant, flexible and extensible enough the first time around.

This is a product and engineering management challenge that can only be met by hiring bonafide A players function at a high level of productivity and organizational cohesion. Does this sound like SAS between 2005 and 2020?

“a big problem with CAS was its design primarily for servicing the “V” solutions with row-wise access…”

Thanks for posting that.

 On an unrelated project at SAS, I designed and implemented a columnar data store, so that our project could handle larger data sets.

This was around 2012. It was not unknown technology at that time.

Really surprised they made that mistake.

@g7+1jmf72xgb

Wasn't it intended to replace Enterprise Miner?

Which of the V* products made meaningful money?

Architecturally speaking, a big problem with CAS was its design primarily for servicing the “V” solutions with row-wise access as evolved from earlier LASR-based VA patterns. This required relational column projections to stride entire rows in RAM even when only a few columns in a wide table were required. Imagine the overhead when only a few columns in a very wide table were specified. Also, VA heavily depended on FCMP computed columns which made it very difficult to optimize physical row selection for WHERE processing. Much of this owes to even older traditional SAS row/var processing patterns. Were modern advances in data access for analytical processing even considered?

Conversely, here’s what Spark is optimized for:

AI reply to query: “ is apache spark optimized for columnar access”

Yes, Apache Spark is optimized for columnar access, particularly when using data formats like Apache Parquet, which stores data in a columnar format, leading to significantly improved performance for operations like filtering and aggregations on large datasets.
Key points about Spark and columnar access:

Parquet integration:
Spark leverages Parquet as a preferred storage format due to its columnar structure, allowing for efficient data retrieval by only accessing the relevant columns for a query.

Optimized operators:
Spark's internal operators are designed to work effectively with columnar data, enabling faster processing of large datasets.

Performance benefits:
Columnar access allows Spark to significantly reduce the amount of data that needs to be read from storage, leading to faster query execution times

——————

Wasn’t Parquet eventually integrated into CAS? How is that working out? Does it perform well?

Coming forward to even more modern analytical data processing optimizations:

https://medium.com/@zujkanovic/exploring-duckdb-and-the-columnar-advantage-f7beb8cbf478

VDMML (Visual Data Mining and Machine Learning)....I heard several folk tried to convince JG that it should be just VML (Visual Machine Learning), but JG insisted on keeping Data Mining in the name because he thought it was still a cool and relevant term....that's how out of touch he was.

@fg+1jmf72xgb

Of course it did. Why would they change their "winning formula" for these products?

"V" stands for "Vanity", not "Visual" -- they were all someone's vanity project.

"My observation was that complexity was not managed (through any efforts to avoid complexity or to simplify, refactor, or redesign)."

I saw this so many times at SAS. Managers were rewarded for producing features. They were never rewarded for simplifying the design — even though that tends to produce better, faster, cheaper software.

It takes effort for management to understand design. It’s so much easier to count features.

At the systems level, “Feature Factories” produce complex code that’s expensive to debug and enhance. At the application level, they also produce complex user interfaces that are hard for customers to learn and use.

I don’t know whether either effect occurred on the V* products; it would be unsurprising if both did. Perhaps someone with personal experience will post.

@bp+1jmf72xgb

They hand-picked bunches of kids from Art school to make these look pretty. Are you saying that the efforts of the Art Department, which is safe, lacks appeal?

by then, the CAS in-memory Table design and its I/O and related functionality had grown to be very complex

My observation was that complexity was not managed (through any efforts to avoid complexity or to simplify, refactor, or redeisgn). Rather, it grew unbounded and that growth was encouraged by a never-ending management stream of ideas for additions and, of course consequentially, many fixes for complex defects.

initially designed for on-premise deployments using single SMP machine or MPP grid architectures. Significant design decisions and optimizations necessary to run efficiently …

Which is the nature of evolving software systems - maleable and brittle, yet largely static due to the constant pressure to ship the product instead of shipping the right product or a quality product.

Viya infrastructure was designed for cloud computing possibilities, yet definitely not as a “cloud first” architecture.

2012-2014 : CAS and Java micro service teams had collectively at best, minimal working knowledge of public cloud APIs and the costs associated with scaling up workloads on various cloud sizing configurations. A few individuals and a couple of subgroups were focused on building this expertise which gradually ramped up over the next 3 to 5 years.

By 2018, Viya infrastructure was commonly deployed in part or whole by developers, testers, and performance specialists into various AWS, Azure and GCP clouds. However, by then, the CAS in-memory Table design and its I/O and related functionality had grown to be very complex, reflecting an evolution of SASHDAT tables/files going back to LASR, yet with support for new CASLIB types and their respective storage formats.

Most of this, with the exception of S3 support was initially designed for on-premise deployments using single SMP machine or MPP grid architectures. Significant design decisions and optimizations necessary to run efficiently and cost-effectively in public cloud infrastructures were not initially baked into CAS. The accrued consequences are a direct result of not prioritizing a cloud first CAS design in the beginning.

There was also the rapid evolution of containers and Kubernetes occurring in parallel to the timeframe CAS arose. This, a longer initial R&D effort would’ve been needed to capture the operational requirements and considerable implementation changes necessary for CAS and related components to be made truly cloud native from the git go. Sadly, this did not occur, and the price of remediation is apparently still being paid.

Also, the branding was beyond stupid. Visual this and visual that? I'd bet a significant fraction of potential customers didn't even give the products a look based on the names alone.

They were (probably still are) ridiculously expensive to license and require all data to be loaded into memory which necessitated expensive hardware or high cloud costs.

For 95% of use cases, customers can get the same functionality for a fraction of the cost with other software.

Thread regarding SAS Institute layoffs

What happened with the V’s?

25 replies (most recent on top)

Post a reply