Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anecdotal comment: it looks like Amazon is putting an audio player at the top of each press release to showcase their "Amazon Polly" deep learning voiceover. Starting at 3:18 in the audio it starts reading the entire table and (expectedly) screws up reading the table, but not too bad.

However it's a great way of proving that we are not ready for computers to read to us yet.



The audio is generated by Amazon Polly; see https://aws.amazon.com/blogs/aws/give-your-wordpress-blog-a-... for more info!

I now use the audio version as a second form of proofreading, and find that it helps me to find places where my written transitions could be better.

I agree that there's room to make the contents of the table sound better, but I am not sure what direction this should go in. Suggestions are welcome!


Hi Jeff, here's a suggestion: When entering a table, maybe Polly can announce the table and read each data cell followed by the <th> descriptor for that cell.

Example:

"Table 1: Row 1: [Instance name: z1d.large] ... [vCPUs: 2] ... [Memory: 16 GB] ... [Local Storage: 1 x 75 GB NVMe SSD] ... [EBS-Optimized Bandwidth: Up to 2.333 Gbps] ... [Network Bandwidth: Up to 10 Gbps] ... [pause] ... Row 2: [...]"

This is how a human would read it, so Polly should do it that way too. What a human actually ends up doing though, is reference the preceding row for each subsequent row. Something like "z1d.xlarge has double the vCPUs, double the Memory, and double the Local Storage, with EBS-Optimized Bandwidth and Network Bandwidth the same." -- I don't think you are at that point yet with Polly ;)


Hmmm - cool idea, but definitely ambitious and re:Invent is almost here.

The plugin is open source and we welcome PRs at https://github.com/awslabs/amazon-polly-wordpress-plugin . Feel free to code something up and give it a try :-)


Even better would be to replicate screen reader behavior—not only does it handle tables well, but it can interface with all semantic elements (buttons, navs, links, images, etc) in a well-defined manner.


Would be good if the player had an option to speed up, e.g. 2x.


We just gave Polly the ability to ensure that a given block of text is spoken within a specified period of time:

https://aws.amazon.com/blogs/aws/amazon-polly-update-time-dr...

I am messaging the team to see how we might use this for the plugin.


Hi. I can get a physical (bare metal in newspeak) 8-HT Xeon with 64G RAM and 1TB RAID1 SSD for ~$100/month, <strike>10GB</strike> 1GB NIC uplink. How much is the equivalent z1d.2xlarge EC2 instance?


Where? Asking for a friend :-)


There are multiple providers. This is one of them, it is in Europe, but has datacenters in USA. Is not the cheapest or has the best hardware specs but is good enough. A few google searches will give you alternatives if you care.

The Xeons are not the lastest or fastest models in the city but are good enough, the difference with newer models and higher clock speeds will probably be in 1-10% processing power for most workloads, unless you are _very_ CPU bound.

https://www.arsys.net/servers/dedicated

Of course this is _not_ a 1:1 comparasion with EC2. This are unmanaged bare metal servers (with good http APIs and webpanels to admin though) but sometimes that is good enough and you can save a good chunk of money. You can up/down a server instantly (well - minutes) and you wont be charged if you are not using it.


Ah, there's also hetzner.com which I've had a good experience with.


> to showcase their "Amazon Polly" deep learning voiceover

It may also be an attempt at making the content accessible to blind or otherwise visual-reading-impaired users.


Text is pretty accessible to visual or reading-impaired users. Screen reader tech is pretty good at handling articles like this. Anybody who needs it probably has a better solution than Polly.

It is a cool way to demo their tech though


I think the majority of the perceived screwed-upness comes from its failure to enunciate "vCPU" and "NVMe", so I think it'll sound a lot better once they fix this bug.


Very interesting observation, I waited 3 minutes for it to read the table :P

How would Human read a table though? I think table is mainly for the eyes, reading tables seems wasteful.


I think it sounds pretty good for normal text though.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: